library(knitr)
source("../R/SFA.ExtractTopFeatures.R")
source("../R/SFA.TopFeaturesPerFac.R")

We perform gene annotations from the GTEx SFA analysis.

GTEx 2013 Factor analysis (sparse loadings: sqrt counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013/counts_sqrt_gtex/counts_sqrt_gtex_lambda.out");
f_out <- t(read.table("../sfa_outputs/GTEX2013/counts_sqrt_gtex/counts_sqrt_gtex_F.out"));

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;

#indices_mat <- SFA.TopFeaturesPerFac(f_out, top_features = 100)

indices_mat <- SFA.ExtractTopFeatures(f_out, top_features = 100,
                      options = "min",mult.annotate = TRUE)

gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

SFA loadings plot

samples_id <- read.table("../sfa_inputs/samples_id.txt");

tissue_labels <- vector("numeric", NROW(samples_id))
tissue_labels <- samples_id[ ,3]

tissue_levels <- unique(tissue_labels);


cumsum_val <- c(1,cumsum(as.numeric(table(tissue_labels))))
cumsum_low <- cumsum_val[1:(length(cumsum_val)-1)]
cumsum_high <- cumsum_val[2:(length(cumsum_val))];
cumsum_mean <- 0.5*(cumsum_low+cumsum_high)


for(k in 1:20){
png(paste0("../sfa_outputs/GTEX2013_transpose/sfa-figures/sqrt_counts_sparse_load_loadings/gtex_sfa_loadings_",k,".png"), width=4, height=4, units="in", res=600)
par(mar=c(6,3,1,1))
par(mar=c(10,3,2,2))
barplot(lambda_out[,k], axisnames=F,space=0,border=NA,
        main=paste0("SFA on gtex expression: loading:", k),
        las=1, cex.axis=0.3,cex.main=0.4,
        ylim=c(min(lambda_out[,k]),max(lambda_out[,k])))
axis(1,at=cumsum_mean,unique(tissue_labels),las=2, cex.axis=0.3);
abline(v=cumsum_high)
dev.off()
}

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query
3860 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13 ENSG00000171401
3851 keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT4 ENSG00000170477
6707 small proline rich protein 3 NA SPRR3 ENSG00000163209
6280 S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100A9 ENSG00000163220
ENSG00000229732 NA NA AC019349.5 ENSG00000229732
3853 keratin 6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT6A ENSG00000205420
301 annexin A1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ANXA1 ENSG00000135046
6279 S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. S100A8 ENSG00000143546
49860 cornulin This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. CRNN ENSG00000143536
51458 Rh family C glycoprotein NA RHCG ENSG00000140519
3852 keratin 5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT5 ENSG00000186081
1476 cystatin B The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). CSTB ENSG00000160213
5493 periplakin The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. PPL ENSG00000118898
2012 epithelial membrane protein 1 NA EMP1 ENSG00000134531
5730 prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. PTGDS ENSG00000107317
4155 myelin basic protein The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. MBP ENSG00000197971
1475 cystatin A The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. CSTA ENSG00000121552
2597 glyceraldehyde-3-phosphate dehydrogenase This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. GAPDH ENSG00000111640
1893 extracellular matrix protein 1 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. ECM1 ENSG00000143369
4904 Y-box binding protein 1 This gene encodes a highly conserved cold shock domain protein that has broad nucleic acid binding properties. The encoded protein functions as both a DNA and RNA binding protein and has been implicated in numerous cellular processes including regulation of transcription and translation, pre-mRNA splicing, DNA reparation and mRNA packaging. This protein is also a component of messenger ribonucleoprotein (mRNP) complexes and may have a role in microRNA processing. This protein can be secreted through non-classical pathways and functions as an extracellular mitogen. Aberrant expression of the gene is associated with cancer proliferation in numerous tissues. This gene may be a prognostic marker for poor outcome and drug resistance in certain cancers. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on multiple chromosomes. YBX1 ENSG00000065978
11005 serine peptidase inhibitor, Kazal type 5 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. SPINK5 ENSG00000133710
72 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 ENSG00000163017
7053 transglutaminase 3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. TGM3 ENSG00000125780
79026 AHNAK nucleoprotein NA AHNAK ENSG00000124942
8531 Y-box binding protein 3 NA YBX3 ENSG00000060138
4134 microtubule associated protein 4 The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. MAP4 ENSG00000047849
57402 S100 calcium binding protein A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). S100A14 ENSG00000189334
7812 cold shock domain containing E1 NA CSDE1 ENSG00000009307
3320 heat shock protein 90kDa alpha family class A member 1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. HSP90AA1 ENSG00000080824
488 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. ATP2A2 ENSG00000174437
3880 keratin 19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. KRT19 ENSG00000171345
6698 small proline rich protein 1A NA SPRR1A ENSG00000169474
5213 phosphofructokinase, muscle Three phosphofructokinase isozymes exist in humans: muscle, liver and platelet. These isozymes function as subunits of the mammalian tetramer phosphofructokinase, which catalyzes the phosphorylation of fructose-6-phosphate to fructose-1,6-bisphosphate. Tetramer composition varies depending on tissue type. This gene encodes the muscle-type isozyme. Mutations in this gene have been associated with glycogen storage disease type VII, also known as Tarui disease. Alternatively spliced transcript variants have been described. PFKM ENSG00000152556
360 aquaporin 3 (Gill blood group) This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. AQP3 ENSG00000165272
4070 tumor-associated calcium signal transducer 2 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. TACSTD2 ENSG00000184292
6700 small proline rich protein 2A NA SPRR2A ENSG00000241794
7314 ubiquitin B This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. UBB ENSG00000170315
3557 interleukin 1 receptor antagonist The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. IL1RN ENSG00000136689
8087 FMR1 autosomal homolog 1 The protein encoded by this gene is an RNA binding protein that interacts with the functionally-similar proteins FMR1 and FXR2. These proteins shuttle between the nucleus and cytoplasm and associate with polyribosomes, predominantly with the 60S ribosomal subunit. Three transcript variants encoding different isoforms have been found for this gene. FXR1 ENSG00000114416
3866 keratin 15 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. KRT15 ENSG00000171346
4957 outer dense fiber of sperm tails 2 The outer dense fibers are cytoskeletal structures that surround the axoneme in the middle piece and principal piece of the sperm tail. The fibers function in maintaining the elastic structure and recoil of the sperm tail as well as in protecting the tail from shear forces during epididymal transport and ejaculation. Defects in the outer dense fibers lead to abnormal sperm morphology and infertility. This gene encodes one of the major outer dense fiber proteins. Alternative splicing results in multiple transcript variants. The longer transcripts, also known as ‘Cenexins’, encode proteins with a C-terminal extension that are differentially targeted to somatic centrioles and thought to be crucial for the formation of microtubule organizing centers. ODF2 ENSG00000136811
3895 kinectin 1 This gene encodes an integral membrane protein that is a member of the kinectin protein family. The encoded protein is primarily localized to the endoplasmic reticulum membrane. This protein binds kinesin and may be involved in intracellular organelle motility. This protein also binds translation elongation factor-delta and may be involved in the assembly of the elongation factor-1 complex. Alternate splicing results in multiple transcript variants of this gene. KTN1 ENSG00000126777
1191 clusterin The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. CLU ENSG00000120885
2810 stratifin NA SFN ENSG00000175793
7169 tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2 ENSG00000198467
6282 S100 calcium binding protein A11 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. S100A11 ENSG00000163191
808 calmodulin 3 (phosphorylase kinase, delta) NA CALM3 ENSG00000160014
805 calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. CALM2 ENSG00000160014
9659 phosphodiesterase 4D interacting protein The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. PDE4DIP ENSG00000178104
84033 obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. OBSCN ENSG00000154358
3843 importin 5 Nucleocytoplasmic transport, a signal- and energy-dependent process, takes place through nuclear pore complexes embedded in the nuclear envelope. The import of proteins containing a nuclear localization signal (NLS) requires the NLS import receptor, a heterodimer of importin alpha and beta subunits also known as karyopherins. Importin alpha binds the NLS-containing cargo in the cytoplasm and importin beta docks the complex at the cytoplasmic side of the nuclear pore complex. In the presence of nucleoside triphosphates and the small GTP binding protein Ran, the complex moves into the nuclear pore complex and the importin subunits dissociate. Importin alpha enters the nucleoplasm with its passenger protein and importin beta remains at the pore. Interactions between importin beta and the FG repeats of nucleoporins are essential in translocation through the pore complex. The protein encoded by this gene is a member of the importin beta family. IPO5 ENSG00000065150
4118 mal, T-cell differentiation protein The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. MAL ENSG00000172005
5108 pericentriolar material 1 The protein encoded by this gene is a component of centriolar satellites, which are electron dense granules scattered around centrosomes. Inhibition studies show that this protein is essential for the correct localization of several centrosomal proteins, and for anchoring microtubules to the centrosome. Chromosomal aberrations involving this gene are associated with papillary thyroid carcinomas and a variety of hematological malignancies, including atypical chronic myeloid leukemia and T-cell lymphoma. Multiple transcript variants encoding different isoforms have been found for this gene. PCM1 ENSG00000078674
226 aldolase, fructose-bisphosphate A The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. ALDOA ENSG00000149925
7051 transglutaminase 1 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). TGM1 ENSG00000092295
7170 tropomyosin 3 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. TPM3 ENSG00000143549
682 basigin (Ok blood group) The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. BSG ENSG00000172270
3858 keratin 10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. KRT10 ENSG00000186395
10933 mortality factor 4 like 1 NA MORF4L1 ENSG00000185787
2879 glutathione peroxidase 4 This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. GPX4 ENSG00000167468
10749 kinesin family member 1C The protein encoded by this gene is a member of the kinesin-like protein family. The family members are microtubule-dependent molecular motors that transport organelles within cells and move chromosomes during cell division. Mutations in this gene are a cause of spastic ataxia 2, autosomal recessive. KIF1C ENSG00000129250
2947 glutathione S-transferase mu 3 (brain) Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. Mutations of this class mu gene have been linked with a slight increase in a number of cancers, likely due to exposure with environmental toxins. Alternative splicing results in multiple transcript variants. GSTM3 ENSG00000134202
140465 myosin light chain 6B Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in both slow-twitch skeletal muscle and in nonmuscle tissue. Alternative splicing results in multiple transcript variants. MYL6B ENSG00000196465
9320 thyroid hormone receptor interactor 12 NA TRIP12 ENSG00000153827
1410 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB ENSG00000109846
2706 gap junction protein beta 2 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. GJB2 ENSG00000165474
2318 filamin C This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. FLNC ENSG00000128591
7916 proline rich coiled-coil 2A A cluster of genes, BAT1-BAT5, has been localized in the vicinity of the genes for TNF alpha and TNF beta. These genes are all within the human major histocompatibility complex class III region. This gene has microsatellite repeats which are associated with the age-at-onset of insulin-dependent diabetes mellitus (IDDM) and possibly thought to be involved with the inflammatory process of pancreatic beta-cell destruction during the development of IDDM. This gene is also a candidate gene for the development of rheumatoid arthritis. Two transcript variants encoding the same protein have been found for this gene. PRRC2A ENSG00000204469
3312 heat shock protein family A (Hsp70) member 8 This gene encodes a member of the heat shock protein 70 family, which contains both heat-inducible and constitutively expressed members. This protein belongs to the latter group, which are also referred to as heat-shock cognate proteins. It functions as a chaperone, and binds to nascent polypeptides to facilitate correct folding. It also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. HSPA8 ENSG00000109971
1992 serpin family B member 1 The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. SERPINB1 ENSG00000021355
8672 eukaryotic translation initiation factor 4 gamma 3 The protein encoded by this gene is thought to be part of the eIF4F protein complex, which is involved in mRNA cap recognition and transport of mRNAs to the ribosome. Interestingly, a microRNA (miR-520c-3p) has been found that negatively regulates synthesis of the encoded protein, and this leads to a global decrease in protein translation and cell proliferation. Therefore, this protein is a key component of the anti-tumor activity of miR-520c-3p. EIF4G3 ENSG00000075151
2335 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. FN1 ENSG00000115414
5315 pyruvate kinase, muscle This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. PKM ENSG00000067225
8218 clathrin heavy chain like 1 This gene is a member of the clathrin heavy chain family and encodes a major protein of the polyhedral coat of coated pits and vesicles. Chromosomal aberrations involving this gene are associated with meningioma, DiGeorge syndrome, and velo-cardio-facial syndrome. Multiple transcript variants encoding different isoforms have been found for this gene. CLTCL1 ENSG00000070371
6272 sortilin 1 This gene encodes a member of the VPS10-related sortilin family of proteins. The encoded preproprotein is proteolytically processed by furin to generate the mature receptor. This receptor plays a role in the trafficking of different proteins to either the cell surface, or subcellular compartments such as lysosomes and endosomes. Expression levels of this gene may influence the risk of myocardial infarction in human patients. Alternative splicing results in multiple transcript variants. SORT1 ENSG00000134243
4779 nuclear factor, erythroid 2 like 1 This gene encodes a protein that is involved in globin gene expression in erythrocytes. Confusion has occurred in bibliographic databases due to the shared symbol of NRF1 for this gene, NFE2L1, and for ‘nuclear respiratory factor 1’ which has an official symbol of NRF1. NFE2L1 ENSG00000082641
65018 PTEN induced putative kinase 1 This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. PINK1 ENSG00000158828
4077 NBR1, autophagy cargo receptor The protein encoded by this gene was originally identified as an ovarian tumor antigen monitored in ovarian cancer. The encoded protein contains a B-box/coiled-coil motif, which is present in many genes with transformation potential. It functions as a specific autophagy receptor for the selective autophagic degradation of peroxisomes by forming intracellular inclusions with ubiquitylated autophagic substrates. This gene is located on a region of chromosome 17q21.1 that is in close proximity to the BRCA1 tumor suppressor gene. Alternative splicing of this gene results in multiple transcript variants. NBR1 ENSG00000188554
5116 pericentrin The protein encoded by this gene binds to calmodulin and is expressed in the centrosome. It is an integral component of the pericentriolar material (PCM). The protein contains a series of coiled-coil domains and a highly conserved PCM targeting motif called the PACT domain near its C-terminus. The protein interacts with the microtubule nucleation component gamma-tubulin and is likely important to normal functioning of the centrosomes, cytoskeleton, and cell-cycle progression. Mutations in this gene cause Seckel syndrome-4 and microcephalic osteodysplastic primordial dwarfism type II. Two transcript variants encoding different isoforms have been found for this gene. PCNT ENSG00000160299
1291 collagen type VI alpha 1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. COL6A1 ENSG00000142156
3487 insulin like growth factor binding protein 4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. IGFBP4 ENSG00000141753
3133 major histocompatibility complex, class I, E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. HLA-E ENSG00000204592
3306 heat shock protein family A (Hsp70) member 2 NA HSPA2 ENSG00000126803
7917 BCL2 associated athanogene 6 This gene was first characterized as part of a cluster of genes located within the human major histocompatibility complex class III region. This gene encodes a nuclear protein that is cleaved by caspase 3 and is implicated in the control of apoptosis. In addition, the protein forms a complex with E1A binding protein p300 and is required for the acetylation of p53 in response to DNA damage. Multiple transcript variants encoding different isoforms have been found for this gene. BAG6 ENSG00000204463
2670 glial fibrillary acidic protein This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. GFAP ENSG00000131095
4953 ornithine decarboxylase 1 This gene encodes the rate-limiting enzyme of the polyamine biosynthesis pathway which catalyzes ornithine to putrescine. The activity level for the enzyme varies in response to growth-promoting stimuli and exhibits a high turnover rate in comparison to other mammalian proteins. Originally localized to both chromosomes 2 and 7, the gene encoding this enzyme has been determined to be located on 2p25, with a pseudogene located on 7q31-qter. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified. ODC1 ENSG00000115758
64397 zinc finger protein 106 NA ZNF106 ENSG00000103994
57062 DEAD-box helicase 24 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which shows little similarity to any of the other known human DEAD box proteins, but shows a high similarity to mouse Ddx24 at the amino acid level. DDX24 ENSG00000089737
477 ATPase Na+/K+ transporting subunit alpha 2 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ATP1A2 ENSG00000018625
ENSG00000234964 fatty acid binding protein 5 pseudogene 7 NA FABP5P7 ENSG00000234964
8490 regulator of G-protein signaling 5 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. RGS5 ENSG00000143248
7538 ZFP36 ring finger protein NA ZFP36 ENSG00000128016
23352 ubiquitin protein ligase E3 component n-recognin 4 The protein encoded by this gene is an E3 ubiquitin-protein ligase that interacts with the retinoblastoma-associated protein in the nucleus and with calcium-bound calmodulin in the cytoplasm. The encoded protein appears to be a cytoskeletal component in the cytoplasm and part of the chromatin scaffold in the nucleus. In addition, this protein is a target of the human papillomavirus type 16 E7 oncoprotein. UBR4 ENSG00000127481
3069 high density lipoprotein binding protein The protein encoded by this gene binds high density lipoprotein (HDL) and may function to regulate excess cholesterol levels in cells. The encoded protein also binds RNA and can induce heterochromatin formation. HDLBP ENSG00000115677
4898 nardilysin convertase This gene encodes a zinc-dependent endopeptidase that cleaves peptide substrates at the N-terminus of arginine residues in dibasic moieties and is a member of the peptidase M16 family. This protein interacts with heparin-binding EGF-like growth factor and plays a role in cell migration and proliferation. Multiple transcript variants encoding different isoforms have been found for this gene. NRDC ENSG00000078618
960 CD44 molecule (Indian blood group) The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. CD44 ENSG00000026508
51477 inositol-3-phosphate synthase 1 This gene encodes an inositol-3-phosphate synthase enzyme. The encoded protein plays a critical role in the myo-inositol biosynthesis pathway by catalyzing the rate-limiting conversion of glucose 6-phosphate to myoinositol 1-phosphate. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the short arm of chromosome 4. ISYNA1 ENSG00000105655
2194 fatty acid synthase The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. FASN ENSG00000169710
4928 nucleoporin 98 Nuclear pore complexes (NPCs) regulate the transport of macromolecules between the nucleus and cytoplasm, and are composed of many polypeptide subunits, many of which belong to the nucleoporin family. This gene belongs to the nucleoporin gene family and encodes a 186 kDa precursor protein that undergoes autoproteolytic cleavage to generate a 98 kDa nucleoporin and 96 kDa nucleoporin. The 98 kDa nucleoporin contains a Gly-Leu-Phe-Gly (GLGF) repeat domain and participates in many cellular processes, including nuclear import, nuclear export, mitotic progression, and regulation of gene expression. The 96 kDa nucleoporin is a scaffold component of the NPC. Proteolytic cleavage is important for targeting of the proteins to the NPC. Translocations between this gene and many other partner genes have been observed in different leukemias. Rearrangements typically result in chimeras with the N-terminal GLGF domain of this gene to the C-terminus of the partner gene. Alternative splicing results in multiple transcript variants encoding different isoforms, at least two of which are proteolytically processed. Some variants lack the region that encodes the 96 kDa nucleoporin. NUP98 ENSG00000110713
140576 S100 calcium binding protein A16 NA S100A16 ENSG00000188643
6281 S100 calcium binding protein A10 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. S100A10 ENSG00000197747
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id
PRM2 ENSG00000122304 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. protamine 2 5620
PRM1 ENSG00000175646 NA protamine 1 5619
FN1 ENSG00000115414 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. fibronectin 1 2335
PHF7 ENSG00000010318 Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. PHD finger protein 7 51533
KRT13 ENSG00000171401 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 3860
MB ENSG00000198125 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. myoglobin 4151
DES ENSG00000175084 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin 1674
MYH7 ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta 4625
TTN ENSG00000155657 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin 7273
GFAP ENSG00000131095 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein 2670
ODF2 ENSG00000136811 The outer dense fibers are cytoskeletal structures that surround the axoneme in the middle piece and principal piece of the sperm tail. The fibers function in maintaining the elastic structure and recoil of the sperm tail as well as in protecting the tail from shear forces during epididymal transport and ejaculation. Defects in the outer dense fibers lead to abnormal sperm morphology and infertility. This gene encodes one of the major outer dense fiber proteins. Alternative splicing results in multiple transcript variants. The longer transcripts, also known as ‘Cenexins’, encode proteins with a C-terminal extension that are differentially targeted to somatic centrioles and thought to be crucial for the formation of microtubule organizing centers. outer dense fiber of sperm tails 2 4957
LOC81691 ENSG00000005189 NA exonuclease NEF-sp 81691
KDM5B ENSG00000117139 This gene encodes a lysine-specific histone demethylase that belongs to the jumonji/ARID domain-containing family of histone demethylases. The encoded protein is capable of demethylating tri-, di- and monomethylated lysine 4 of histone H3. This protein plays a role in the transcriptional repression or certain tumor suppressor genes and is upregulated in certain cancer cells. This protein may also play a role in genome stability and DNA repair. Alternate splicing resultsi n multiple transcript variants. lysine demethylase 5B 10765
ACTB ENSG00000075624 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta 60
CLPB ENSG00000162129 This gene belongs to the ATP-ases associated with diverse cellular activities (AAA+) superfamily. Members of this superfamily form ring-shaped homo-hexamers and have highly conserved ATPase domains that are involved in various processes including DNA replication, protein degradation and reactivation of misfolded proteins. All members of this family hydrolyze ATP through their AAA+ domains and use the energy generated through ATP hydrolysis to exert mechanical force on their substrates. In addition to an AAA+ domain, the protein encoded by this gene contains a C-terminal D2 domain, which is characteristic of the AAA+ subfamily of Caseinolytic peptidases to which this protein belongs. It cooperates with Hsp70 in the disaggregation of protein aggregates. Allelic variants of this gene are associated with 3-methylglutaconic aciduria, which causes cataracts and neutropenia. Alternative splicing results in multiple transcript variants. ClpB homolog, mitochondrial AAA ATPase chaperonin 81570
FLNC ENSG00000128591 This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. filamin C 2318
LRWD1 ENSG00000161036 The protein encoded by this gene interacts with components of the origin recognition complex (ORC) and regulates the formation of the prereplicative complex. The encoded protein stabilizes the ORC and therefore aids in DNA replication. This protein is required for the G1/S phase transition of the cell cycle. In addition, the encoded protein binds to trimethylated histone H3 in heterochromatin and recruits the ORC and lysine methyltransferases, which help maintain the repressive heterochromatic state. Two transcript variants encoding different isoforms have been found for this gene. leucine rich repeats and WD repeat domain containing 1 222229
NEB ENSG00000183091 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. nebulin 4703
TSACC ENSG00000163467 NA TSSK6 activating cochaperone 128229
NPPA ENSG00000175206 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. natriuretic peptide A 4878
GLUL ENSG00000135821 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase 2752
GPX3 ENSG00000211445 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 2878
HLA-B ENSG00000234745 HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. major histocompatibility complex, class I, B 3106
ACRBP ENSG00000111644 The protein encoded by this gene is similar to proacrosin binding protein sp32 precursor found in mouse, guinea pig, and pig. This protein is located in the sperm acrosome and is thought to function as a binding protein to proacrosin for packaging and condensation of the acrosin zymogen in the acrosomal matrix. This protein is a member of the cancer/testis family of antigens and it is found to be immunogenic. In normal tissues, this mRNA is expressed only in testis, whereas it is detected in a range of different tumor types such as bladder, breast, lung, liver, and colon. acrosin binding protein 84519
AC019349.5 ENSG00000229732 NA NA ENSG00000229732
POLR2A ENSG00000181222 This gene encodes the largest subunit of RNA polymerase II, the polymerase responsible for synthesizing messenger RNA in eukaryotes. The product of this gene contains a carboxy terminal domain composed of heptapeptide repeats that are essential for polymerase activity. These repeats contain serine and threonine residues that are phosphorylated in actively transcribing RNA polymerase. In addition, this subunit, in combination with several other polymerase subunits, forms the DNA binding domain of the polymerase, a groove in which the DNA template is transcribed into RNA. polymerase (RNA) II subunit A 5430
PPM1G ENSG00000115241 The protein encoded by this gene is a member of the PP2C family of Ser/Thr protein phosphatases. PP2C family members are known to be negative regulators of cell stress response pathways. This phosphatase is found to be responsible for the dephosphorylation of Pre-mRNA splicing factors, which is important for the formation of functional spliceosome. Studies of a similar gene in mice suggested a role of this phosphatase in regulating cell cycle progression. protein phosphatase, Mg2+/Mn2+ dependent 1G 5496
WDR62 ENSG00000075702 This gene is proposed to play a role in cerebral cortical development. Mutations in this gene have been associated with microencephaly, cortical malformations, and mental retardation. Alternative splicing results in multiple transcript variants. WD repeat domain 62 284403
GAPDHS ENSG00000105679 This gene encodes a protein belonging to the glyceraldehyde-3-phosphate dehydrogenase family of enzymes that play an important role in carbohydrate metabolism. Like its somatic cell counterpart, this sperm-specific enzyme functions in a nicotinamide adenine dinucleotide-dependent manner to remove hydrogen and add phosphate to glyceraldehyde 3-phosphate to form 1,3-diphosphoglycerate. During spermiogenesis, this enzyme may play an important role in regulating the switch between different energy-producing pathways, and it is required for sperm motility and male fertility. glyceraldehyde-3-phosphate dehydrogenase, spermatogenic 26330
TEX40 ENSG00000219435 NA testis expressed 40 ENSG00000219435
SPRR3 ENSG00000163209 NA small proline rich protein 3 6707
CCDC136 ENSG00000128596 NA coiled-coil domain containing 136 64753
TNNT2 ENSG00000118194 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. troponin T2, cardiac type 7139
S100A9 ENSG00000163220 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 6280
LOC100129518 ENSG00000112096 NA uncharacterized LOC100129518 100129518
SOD2 ENSG00000112096 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial 6648
YBX1 ENSG00000065978 This gene encodes a highly conserved cold shock domain protein that has broad nucleic acid binding properties. The encoded protein functions as both a DNA and RNA binding protein and has been implicated in numerous cellular processes including regulation of transcription and translation, pre-mRNA splicing, DNA reparation and mRNA packaging. This protein is also a component of messenger ribonucleoprotein (mRNP) complexes and may have a role in microRNA processing. This protein can be secreted through non-classical pathways and functions as an extracellular mitogen. Aberrant expression of the gene is associated with cancer proliferation in numerous tissues. This gene may be a prognostic marker for poor outcome and drug resistance in certain cancers. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on multiple chromosomes. Y-box binding protein 1 4904
DNAJB1 ENSG00000132002 This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. DnaJ heat shock protein family (Hsp40) member B1 3337
HSPB8 ENSG00000152137 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. heat shock protein family B (small) member 8 26353
ISYNA1 ENSG00000105655 This gene encodes an inositol-3-phosphate synthase enzyme. The encoded protein plays a critical role in the myo-inositol biosynthesis pathway by catalyzing the rate-limiting conversion of glucose 6-phosphate to myoinositol 1-phosphate. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the short arm of chromosome 4. inositol-3-phosphate synthase 1 51477
ZMYND10 ENSG00000004838 This gene encodes a protein containing a MYND-type zinc finger domain that likely functions in assembly of the dynein motor. Mutations in this gene can cause primary ciliary dyskinesia. This gene is also considered a tumor suppressor gene and is often mutated, deleted, or hypermethylated and silenced in cancer cells. Alternative splicing results in multiple transcript variants. zinc finger MYND-type containing 10 51364
KIF2C ENSG00000142945 This gene encodes a kinesin-like protein that functions as a microtubule-dependent molecular motor. The encoded protein can depolymerize microtubules at the plus end, thereby promoting mitotic chromosome segregation. Alternative splicing results in multiple transcript variants. kinesin family member 2C 11004
FBXW5 ENSG00000159069 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene contains WD-40 domains, in addition to an F-box motif, so it belongs to the Fbw class. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene, however, they were found to be nonsense-mediated mRNA decay (NMD) candidates, hence not represented. F-box and WD repeat domain containing 5 54461
TPM2 ENSG00000198467 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) 7169
SPARCL1 ENSG00000152583 NA SPARC like 1 8404
JUP ENSG00000173801 This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. junction plakoglobin 3728
TUBA3D ENSG00000075886 This gene encodes a member of the alpha tubulin family. Tubulin is a major component of microtubules, which are composed of alpha- and beta-tubulin heterodimers and microtubule-associated proteins in the cytoskeleton. Microtubules maintain cellular structure, function in intracellular transport, and play a role in spindle formation during mitosis. tubulin alpha 3d 113457
ACTA1 ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. actin, alpha 1, skeletal muscle 58
CPE ENSG00000109472 This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. carboxypeptidase E 1363
ALKBH7 ENSG00000125652 NA alkB homolog 7 84266
APOD ENSG00000189058 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. apolipoprotein D 347
NUP98 ENSG00000110713 Nuclear pore complexes (NPCs) regulate the transport of macromolecules between the nucleus and cytoplasm, and are composed of many polypeptide subunits, many of which belong to the nucleoporin family. This gene belongs to the nucleoporin gene family and encodes a 186 kDa precursor protein that undergoes autoproteolytic cleavage to generate a 98 kDa nucleoporin and 96 kDa nucleoporin. The 98 kDa nucleoporin contains a Gly-Leu-Phe-Gly (GLGF) repeat domain and participates in many cellular processes, including nuclear import, nuclear export, mitotic progression, and regulation of gene expression. The 96 kDa nucleoporin is a scaffold component of the NPC. Proteolytic cleavage is important for targeting of the proteins to the NPC. Translocations between this gene and many other partner genes have been observed in different leukemias. Rearrangements typically result in chimeras with the N-terminal GLGF domain of this gene to the C-terminus of the partner gene. Alternative splicing results in multiple transcript variants encoding different isoforms, at least two of which are proteolytically processed. Some variants lack the region that encodes the 96 kDa nucleoporin. nucleoporin 98 4928
BAG6 ENSG00000204463 This gene was first characterized as part of a cluster of genes located within the human major histocompatibility complex class III region. This gene encodes a nuclear protein that is cleaved by caspase 3 and is implicated in the control of apoptosis. In addition, the protein forms a complex with E1A binding protein p300 and is required for the acetylation of p53 in response to DNA damage. Multiple transcript variants encoding different isoforms have been found for this gene. BCL2 associated athanogene 6 7917
MYBPC1 ENSG00000196091 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. myosin binding protein C, slow type 4604
CMTM2 ENSG00000140932 This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. CKLF like MARVEL transmembrane domain containing 2 146225
IGHG1 ENSG00000211896 NA immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896
AKAP3 ENSG00000111254 This gene encodes a member of A-kinase anchoring proteins (AKAPs), a family of functionally related proteins that target protein kinase A to discrete locations within the cell. The encoded protein is reported to participate in protein-protein interactions with the R-subunit of the protein kinase A as well as sperm-associated proteins. This protein is expressed in spermatozoa and localized to the acrosomal region of the sperm head as well as the length of the principal piece. It may function as a regulator of motility, capacitation, and the acrosome reaction. A-kinase anchoring protein 3 10566
KRT6A ENSG00000205420 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 6A 3853
NPPA-AS1 ENSG00000242349 NA NPPA antisense RNA 1 ENSG00000242349
GPX4 ENSG00000167468 This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. glutathione peroxidase 4 2879
EHD1 ENSG00000110047 This gene belongs to a highly conserved gene family encoding EPS15 homology (EH) domain-containing proteins. The protein-binding EH domain was first noted in EPS15, a substrate for the epidermal growth factor receptor. The EH domain has been shown to be an important motif in proteins involved in protein-protein interactions and in intracellular sorting. The protein encoded by this gene is thought to play a role in the endocytosis of IGF1 receptors. Alternatively spliced transcript variants have been found for this gene. EH domain containing 1 10938
CABYR ENSG00000154040 To reach fertilization competence, spermatozoa undergo a series of morphological and molecular maturational processes, termed capacitation, involving protein tyrosine phosphorylation and increased intracellular calcium. The protein encoded by this gene localizes to the principal piece of the sperm flagellum in association with the fibrous sheath and exhibits calcium-binding when phosphorylated during capacitation. A pseudogene on chromosome 3 has been identified for this gene. Alternatively spliced transcript variants encoding distinct protein isoforms have been found for this gene. calcium binding tyrosine phosphorylation regulated 26256
H2AFJ ENSG00000246705 Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene is located on chromosome 12 and encodes a replication-independent histone that is a variant H2A histone. The protein is divergent at the C-terminus compared to the consensus H2A histone family member. This gene also encodes an antimicrobial peptide with antibacterial and antifungal activity. H2A histone family member J 55766
ANKRD1 ENSG00000148677 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ankyrin repeat domain 1 27063
PROCA1 ENSG00000167525 NA protein interacting with cyclin A1 147011
RUVBL2 ENSG00000183207 This gene encodes the second human homologue of the bacterial RuvB gene. Bacterial RuvB protein is a DNA helicase essential for homologous recombination and DNA double-strand break repair. Functional analysis showed that this gene product has both ATPase and DNA helicase activities. This gene is physically linked to the CGB/LHB gene cluster on chromosome 19q13.3, and is very close (55 nt) to the LHB gene, in the opposite orientation. RuvB like AAA ATPase 2 10856
PYGM ENSG00000068976 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. phosphorylase, glycogen, muscle 5837
COL3A1 ENSG00000168542 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type III alpha 1 chain 1281
KPNA2 ENSG00000182481 The import of proteins into the nucleus is a process that involves at least 2 steps. The first is an energy-independent docking of the protein to the nuclear envelope and the second is an energy-dependent translocation through the nuclear pore complex. Imported proteins require a nuclear localization sequence (NLS) which generally consists of a short region of basic amino acids or 2 such regions spaced about 10 amino acids apart. Proteins involved in the first step of nuclear import have been identified in different systems. These include the Xenopus protein importin and its yeast homolog, SRP1 (a suppressor of certain temperature-sensitive mutations of RNA polymerase I in Saccharomyces cerevisiae), which bind to the NLS. KPNA2 protein interacts with the NLSs of DNA helicase Q1 and SV40 T antigen and may be involved in the nuclear transport of proteins. KPNA2 also may play a role in V(D)J recombination. Alternative splicing results in multiple transcript variants. karyopherin subunit alpha 2 3838
TNFAIP2 ENSG00000185215 This gene was identified as a gene whose expression can be induced by the tumor necrosis factor alpha (TNF) in umbilical vein endothelial cells. The expression of this gene was shown to be induced by retinoic acid in a cell line expressing a oncogenic version of the retinoic acid receptor alpha fusion protein, which suggested that this gene may be a retinoic acid target gene in acute promyelocytic leukemia. TNF alpha induced protein 2 7127
LINC00467 ENSG00000153363 NA long intergenic non-protein coding RNA 467 ENSG00000153363
NUP214 ENSG00000126883 The nuclear pore complex is a massive structure that extends across the nuclear envelope, forming a gateway that regulates the flow of macromolecules between the nucleus and the cytoplasm. Nucleoporins are the main components of the nuclear pore complex in eukaryotic cells. This gene is a member of the FG-repeat-containing nucleoporins. The protein encoded by this gene is localized to the cytoplasmic face of the nuclear pore complex where it is required for proper cell cycle progression and nucleocytoplasmic transport. The 3’ portion of this gene forms a fusion gene with the DEK gene on chromosome 6 in a t(6,9) translocation associated with acute myeloid leukemia and myelodysplastic syndrome. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. nucleoporin 214 8021
TRIM28 ENSG00000130726 The protein encoded by this gene mediates transcriptional control by interaction with the Kruppel-associated box repression domain found in many transcription factors. The protein localizes to the nucleus and is thought to associate with specific chromatin regions. The protein is a member of the tripartite motif family. This tripartite motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. tripartite motif containing 28 10155
FBRS ENSG00000156860 Fibrosin is a lymphokine secreted by activated lymphocytes that induces fibroblast proliferation (Prakash and Robbins, 1998 [PubMed 9809749]). fibrosin 64319
HN1 ENSG00000189159 NA hematological and neurological expressed 1 51155
RHCG ENSG00000140519 NA Rh family C glycoprotein 51458
NUP188 ENSG00000095319 The nuclear pore complex (NPC) is found on the nuclear envelope and forms a gateway that regulates the flow of proteins and RNAs between the cytoplasm and nucleoplasm. The NPC is comprised of approximately 30 distinct proteins collectively known as nucleoporins. Nucleoporins are pore-complex-specific glycoproteins which often have cytoplasmically oriented O-linked N-acetylglucosamine residues and numerous repeats of the pentapeptide sequence XFXFG. However, the nucleoporin protein encoded by this gene does not contain the typical FG repeat sequences found in most vertebrate nucleoporins. This nucleoporin is thought to form part of the scaffold for the central channel of the nuclear pore. nucleoporin 188 23511
PILRB ENSG00000121716 The paired immunoglobin-like type 2 receptors consist of highly related activating and inhibitory receptors that are involved in the regulation of many aspects of the immune system. The paired immunoglobulin-like receptor genes are located in a tandem head-to-tail orientation on chromosome 7. This gene encodes the activating member of the receptor pair and contains a truncated cytoplasmic tail relative to its inhibitory counterpart (PILRA), that has a long cytoplasmic tail with immunoreceptor tyrosine-based inhibitory (ITIM) motifs. This gene is thought to have arisen from a duplication of the inhibitory PILRA gene and evolved to acquire its activating function. paired immunoglobin-like type 2 receptor beta 29990
ATP8B3 ENSG00000130270 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of aminophospholipid-transporting ATPases. The aminophospholipid translocases transport phosphatidylserine and phosphatidylethanolamine from one side of a bilayer to the other. This gene encodes member 3 of phospholipid-transporting ATPase 8B; other members of this protein family are located on chromosomes 1, 15 and 18. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ATPase phospholipid transporting 8B3 148229
RGS5 ENSG00000143248 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. regulator of G-protein signaling 5 8490
MLF1 ENSG00000178053 This gene encodes an oncoprotein which is thought to play a role in the phenotypic determination of hemopoetic cells. Translocations between this gene and nucleophosmin have been associated with myelodysplastic syndrome and acute myeloid leukemia. Multiple transcript variants encoding different isoforms have been found for this gene. myeloid leukemia factor 1 4291
AGBL5 ENSG00000084693 NA ATP/GTP binding protein-like 5 60509
CARHSP1 ENSG00000153048 NA calcium regulated heat stable protein 1 23589
S100A16 ENSG00000188643 NA S100 calcium binding protein A16 140576
GSTM3 ENSG00000134202 Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. Mutations of this class mu gene have been linked with a slight increase in a number of cancers, likely due to exposure with environmental toxins. Alternative splicing results in multiple transcript variants. glutathione S-transferase mu 3 (brain) 2947
FASN ENSG00000169710 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase 2194
TBL2 ENSG00000106638 This gene encodes a member of the beta-transducin protein family. Most proteins of the beta-transducin family are involved in regulatory functions. This protein is possibly involved in some intracellular signaling pathway. This gene is deleted in Williams-Beuren syndrome, a developmental disorder caused by deletion of multiple genes at 7q11.23. transducin (beta)-like 2 26608
CCHCR1 ENSG00000204536 NA coiled-coil alpha-helical rod protein 1 54535
MYH9 ENSG00000100345 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. myosin, heavy chain 9, non-muscle 4627
SLFNL1 ENSG00000171790 NA schlafen like 1 200172
FBXO24 ENSG00000106336 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class. Multiple transcript variants encoding different isoforms have been found for this gene. F-box protein 24 26261
IZUMO4 ENSG00000099840 NA IZUMO family member 4 113177
VARS ENSG00000204394 Aminoacyl-tRNA synthetases catalyze the aminoacylation of tRNA by their cognate amino acid. Because of their central role in linking amino acids with nucleotide triplets contained in tRNAs, aminoacyl-tRNA synthetases are thought to be among the first proteins that appeared in evolution. The protein encoded by this gene belongs to class-I aminoacyl-tRNA synthetase family and is located in the class III region of the major histocompatibility complex. valyl-tRNA synthetase 7407
XRCC6 ENSG00000196419 The p70/p80 autoantigen is a nuclear complex consisting of two subunits with molecular masses of approximately 70 and 80 kDa. The complex functions as a single-stranded DNA-dependent ATP-dependent helicase. The complex may be involved in the repair of nonhomologous DNA ends such as that required for double-strand break repair, transposition, and V(D)J recombination. High levels of autoantibodies to p70 and p80 have been found in some patients with systemic lupus erythematosus. X-ray repair cross complementing 6 2547
PLEKHG4 ENSG00000196155 The protein encoded by this gene can function as a guanine nucleotide exchange factor (GEF) and may play a role in intracellular signaling and cytoskeleton dynamics at the Golgi apparatus. Polymorphisms in the region of this gene have been found to be associated with spinocerebellar ataxia in some study populations. Alternative splicing results in multiple transcript variants. pleckstrin homology and RhoGEF domain containing G4 25894
SLC6A16 ENSG00000063127 SLC6A16 shows structural characteristics of an Na(+)- and Cl(-)-dependent neurotransmitter transporter, including 12 transmembrane (TM) domains, intracellular N and C termini, and large extracellular loops containing multiple N-glycosylation sites. solute carrier family 6 member 16 28968
HADHB ENSG00000138029 This gene encodes the beta subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the beta subunit catalyzing the 3-ketoacyl-CoA thiolase activity. The encoded protein can also bind RNA and decreases the stability of some mRNAs. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation. Mutations in this gene result in trifunctional protein deficiency. Alternatively spliced transcript variants encoding different isoforms have been described. hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), beta subunit 3032
KDM2A ENSG00000173120 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbls class and, in addition to an F-box, contains at least six highly degenerated leucine-rich repeats. This family member plays a role in epigenetic silencing. It nucleates at CpG islands and specifically demethylates both mono- and di-methylated lysine-36 of histone H3. Alternative splicing results in multiple transcript variants. lysine demethylase 2A 22992
DDX20 ENSG00000064703 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which has an ATPase activity and is a component of the survival of motor neurons (SMN) complex. This protein interacts directly with SMN, the spinal muscular atrophy gene product, and may play a catalytic role in the function of the SMN complex on RNPs. DEAD-box helicase 20 11218
NEBL ENSG00000078114 This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. nebulette 10529
LOC101927055 ENSG00000237298 NA uncharacterized LOC101927055 101927055
TTN-AS1 ENSG00000237298 NA TTN antisense RNA 1 100506866
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 ENSG00000163017 ACTG2 actin, gamma 2, smooth muscle, enteric NA
The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 ENSG00000107796 ACTA2 actin, alpha 2, smooth muscle, aorta NA
This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 ENSG00000155657 TTN titin NA
The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 ENSG00000244734 HBB hemoglobin subunit beta NA
The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. 1291 ENSG00000142156 COL6A1 collagen type VI alpha 1 NA
The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ENSG00000143632 ACTA1 actin, alpha 1, skeletal muscle NA
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 ENSG00000133392 MYH11 myosin, heavy chain 11, smooth muscle NA
NA ENSG00000180139 ENSG00000180139 ACTA2-AS1 ACTA2 antisense RNA 1 NA
This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. 4638 ENSG00000065534 MYLK myosin light chain kinase NA
This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ENSG00000075624 ACTB actin, beta NA
The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 ENSG00000107317 PTGDS prostaglandin D2 synthase NA
The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 ENSG00000197971 MBP myelin basic protein NA
Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. 4637 ENSG00000092841 MYL6 myosin light chain 6 NA
The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. 23336 ENSG00000182253 SYNM synemin NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. 87 ENSG00000072110 ACTN1 actinin alpha 1 NA
Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 ENSG00000101335 MYL9 myosin light chain 9 NA
Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. 7052 ENSG00000198959 TGM2 transglutaminase 2 NA
NA NA ENSG00000259716 NA NA TRUE
The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. 84033 ENSG00000154358 OBSCN obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF NA
The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. 158471 ENSG00000106772 PRUNE2 prune homolog 2 NA
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040 ENSG00000188536 HBA2 hemoglobin subunit alpha 2 NA
This gene encodes a member of a small family of focal adhesion proteins which interacts with ILK (integrin-linked kinase), a protein which effects protein-protein interactions with the extraceullar matrix. The encoded protein has five LIM domains, each domain forming two zinc fingers, which permit interactions which regulate cell shape and migration. A pseudogene of this gene is located on chromosome 4. Multiple transcript variants encoding different isoforms have been found for this gene. 55679 ENSG00000072163 LIMS2 LIM zinc finger domain containing 2 NA
The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 ENSG00000149591 TAGLN transgelin NA
This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. 6319 ENSG00000099194 SCD stearoyl-CoA desaturase NA
NA ENSG00000269936 ENSG00000269936 RP11-394O4.5 NA NA
The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. 11034 ENSG00000125868 DSTN destrin, actin depolymerizing factor NA
Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 ENSG00000111245 MYL2 myosin light chain 2 NA
NA 4162 ENSG00000076706 MCAM melanoma cell adhesion molecule NA
This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. 1465 ENSG00000159176 CSRP1 cysteine and glycine rich protein 1 NA
This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. 2878 ENSG00000211445 GPX3 glutathione peroxidase 3 NA
NA 25959 ENSG00000197256 KANK2 KN motif and ankyrin repeat domains 2 NA
This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. 7168 ENSG00000140416 TPM1 tropomyosin 1 (alpha) NA
NA ENSG00000259627 ENSG00000259627 RP11-244F12.2 NA NA
This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 2670 ENSG00000131095 GFAP glial fibrillary acidic protein NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. 6279 ENSG00000143546 S100A8 S100 calcium binding protein A8 NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88 ENSG00000077522 ACTN2 actinin alpha 2 NA
Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 ENSG00000114854 TNNC1 troponin C1, slow skeletal and cardiac type NA
The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 94274 ENSG00000167641 PPP1R14A protein phosphatase 1 regulatory inhibitor subunit 14A NA
The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. 123 ENSG00000147872 PLIN2 perilipin 2 NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 ENSG00000163220 S100A9 S100 calcium binding protein A9 NA
This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). 3911 ENSG00000130702 LAMA5 laminin subunit alpha 5 NA
The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. 682 ENSG00000172270 BSG basigin (Ok blood group) NA
The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. 493 ENSG00000058668 ATP2B4 ATPase plasma membrane Ca2+ transporting 4 NA
This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 ENSG00000108821 COL1A1 collagen type I alpha 1 NA
The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. 4256 ENSG00000111341 MGP matrix Gla protein NA
NA NA ENSG00000256545 NA NA TRUE
The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 ENSG00000104879 CKM creatine kinase, M-type NA
This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. 23413 ENSG00000107130 NCS1 neuronal calcium sensor 1 NA
This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 2819 ENSG00000167588 GPD1 glycerol-3-phosphate dehydrogenase 1 NA
This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. 7314 ENSG00000170315 UBB ubiquitin B NA
This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. 8490 ENSG00000143248 RGS5 regulator of G-protein signaling 5 NA
This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. 4240 ENSG00000140545 MFGE8 milk fat globule-EGF factor 8 protein NA
This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 ENSG00000068976 PYGM phosphorylase, glycogen, muscle NA
This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. 23022 ENSG00000129116 PALLD palladin, cytoskeletal associated protein NA
NA 1809 ENSG00000113657 DPYSL3 dihydropyrimidinase like 3 NA
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 ENSG00000171401 KRT13 keratin 13 NA
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039 ENSG00000206172 HBA1 hemoglobin subunit alpha 1 NA
Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557 ENSG00000173991 TCAP titin-cap NA
The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. 25802 ENSG00000163431 LMOD1 leiomodin 1 NA
The protein encoded by this gene is representative of a family of proteins composed of conserved PDZ and LIM domains. LIM domains are proposed to function in protein-protein recognition in a variety of contexts including gene transcription and development and in cytoskeletal interaction. The LIM domains of this protein bind to protein kinases, whereas the PDZ domain binds to actin filaments. The gene product is involved in the assembly of an actin filament-associated complex essential for transmission of ret/ptc2 mitogenic signaling. The biological function is likely to be that of an adapter, with the PDZ domain localizing the LIM-binding proteins to actin filaments of both skeletal muscle and nonmuscle tissues. Alternative splicing of this gene results in multiple transcript variants. 9260 ENSG00000196923 PDLIM7 PDZ and LIM domain 7 NA
This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. 5166 ENSG00000004799 PDK4 pyruvate dehydrogenase kinase 4 NA
Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6 myosin, heavy chain 6, cardiac muscle, alpha NA
This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1281 ENSG00000168542 COL3A1 collagen type III alpha 1 chain NA
Microtubules of the eukaryotic cytoskeleton perform essential and diverse functions and are composed of a heterodimer of alpha and beta tubulins. The genes encoding these microtubule constituents belong to the tubulin superfamily, which is composed of six distinct families. Genes from the alpha, beta and gamma tubulin families are found in all eukaryotes. The alpha and beta tubulins represent the major components of microtubules, while gamma tubulin plays a critical role in the nucleation of microtubule assembly. There are multiple alpha and beta tubulin genes, which are highly conserved among species. This gene encodes alpha tubulin and is highly similar to the mouse and rat Tuba1 genes. Northern blotting studies have shown that the gene expression is predominantly found in morphologically differentiated neurologic cells. This gene is one of three alpha-tubulin genes in a cluster on chromosome 12q. Mutations in this gene cause lissencephaly type 3 (LIS3) - a neurological condition characterized by microcephaly, mental retardation, and early-onset epilepsy and caused by defective neuronal migration. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 7846 ENSG00000167552 TUBA1A tubulin alpha 1a NA
This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. 7138 ENSG00000105048 TNNT1 troponin T1, slow skeletal type NA
This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. 283120 ENSG00000130600 H19 H19, imprinted maternally expressed transcript (non-protein coding) NA
Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. 5997 ENSG00000116741 RGS2 regulator of G-protein signaling 2 NA
This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. 165 ENSG00000106624 AEBP1 AE binding protein 1 NA
This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. 283131 ENSG00000245532 NEAT1 nuclear paraspeckle assembly transcript 1 (non-protein coding) NA
This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. 7316 ENSG00000150991 UBC ubiquitin C NA
This gene encodes a member of the adenylosuccinate synthase family of proteins. The encoded muscle-specific enzyme plays a role in the purine nucleotide cycle by catalyzing the first step in the conversion of inosine monophosphate (IMP) to adenosine monophosphate (AMP). Mutations in this gene may cause adolescent onset distal myopathy. Alternative splicing results in multiple transcript variants. 122622 ENSG00000185100 ADSSL1 adenylosuccinate synthase like 1 NA
NA ENSG00000266844 ENSG00000266844 RP11-862L9.3 NA NA
This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. 6525 ENSG00000183963 SMTN smoothelin NA
The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320 ENSG00000080824 HSP90AA1 heat shock protein 90kDa alpha family class A member 1 NA
Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. 761 ENSG00000164879 CA3 carbonic anhydrase 3 NA
This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. 5950 ENSG00000138207 RBP4 retinol binding protein 4 NA
The protein encoded by this gene is a secreted protein that is similar to the beta- and gamma-chains of fibrinogen. The carboxyl-terminus of the encoded protein consists of the fibrinogen-related domains (FRED). The encoded protein forms a tetrameric complex which is stabilized by interchain disulfide bonds. This protein may play a role in physiologic functions at mucosal sites. 10875 ENSG00000127951 FGL2 fibrinogen like 2 NA
NA 388 ENSG00000143878 RHOB ras homolog family member B NA
NA ENSG00000261054 ENSG00000261054 RP11-6O2.4 NA NA
This gene encodes the skeletal muscle specific member of the calsequestrin protein family. Calsequestrin functions as a luminal sarcoplasmic reticulum calcium sensor in both cardiac and skeletal muscle cells. This protein, also known as calmitine, functions as a calcium regulator in the mitochondria of skeletal muscle. This protein is absent in patients with Duchenne and Becker types of muscular dystrophy. 844 ENSG00000143318 CASQ1 calsequestrin 1 NA
Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7 myosin, heavy chain 7, cardiac muscle, beta NA
NA 51177 ENSG00000023902 PLEKHO1 pleckstrin homology domain containing O1 NA
This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. 7450 ENSG00000110799 VWF von Willebrand factor NA
This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. 4628 ENSG00000133026 MYH10 myosin, heavy chain 10, non-muscle NA
This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. 11030 ENSG00000157110 RBPMS RNA binding protein with multiple splicing NA
This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. 2879 ENSG00000167468 GPX4 glutathione peroxidase 4 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 ENSG00000170477 KRT4 keratin 4 NA
The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. 58529 ENSG00000177791 MYOZ1 myozenin 1 NA
This gene encodes a member of the profilin family of small actin-binding proteins. The encoded protein plays an important role in actin dynamics by regulating actin polymerization in response to extracellular signals. Deletion of this gene is associated with Miller-Dieker syndrome, and the encoded protein may also play a role in Huntington disease. Multiple pseudogenes of this gene are located on chromosome 1. 5216 ENSG00000108518 PFN1 profilin 1 NA
NA 51559 ENSG00000111696 NT5DC3 5’-nucleotidase domain containing 3 NA
This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. 5662 ENSG00000059915 PSD pleckstrin and Sec7 domain containing NA
This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. 2027 ENSG00000108515 ENO3 enolase 3 NA
Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. 2192 ENSG00000077942 FBLN1 fibulin 1 NA
NA 79026 ENSG00000124942 AHNAK AHNAK nucleoprotein NA
This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10 keratin 10 NA
This gene encodes a member of the membrane-bound adenylyl cyclase enzymes. Adenylyl cyclases mediate G protein-coupled receptor signaling through the synthesis of the second messenger cAMP. Activity of the encoded protein is stimulated by the Gs alpha subunit of G protein-coupled receptors and is inhibited by protein kinase A, calcium and Gi alpha subunits. Single nucleotide polymorphisms in this gene may be associated with low birth weight and type 2 diabetes. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. 111 ENSG00000173175 ADCY5 adenylate cyclase 5 NA
This protein belongs to the aldehyde dehydrogenases family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. This gene does not contain introns in the coding sequence. The variation of this locus may affect the development of alcohol-related problems. 219 ENSG00000137124 ALDH1B1 aldehyde dehydrogenase 1 family member B1 NA
The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 ENSG00000169710 FASN fatty acid synthase NA
This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. 1917 ENSG00000101210 EEF1A2 eukaryotic translation elongation factor 1 alpha 2 NA
NA 5364 ENSG00000164050 PLXNB1 plexin B1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000163359 COL6A3 1293 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. collagen type VI alpha 3 chain NA
ENSG00000111799 COL12A1 1303 This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. collagen type XII alpha 1 chain NA
ENSG00000175899 A2M 2 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. alpha-2-macroglobulin NA
ENSG00000170323 FABP4 2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. fatty acid binding protein 4 NA
ENSG00000166819 PLIN1 5346 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. perilipin 1 NA
ENSG00000196549 MME 4311 This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. membrane metallo-endopeptidase NA
ENSG00000169710 FASN 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase NA
ENSG00000099194 SCD 6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase NA
ENSG00000166923 GREM1 26585 This gene encodes a member of the BMP (bone morphogenic protein) antagonist family. Like BMPs, BMP antagonists contain cystine knots and typically form homo- and heterodimers. The CAN (cerberus and dan) subfamily of BMP antagonists, to which this gene belongs, is characterized by a C-terminal cystine knot with an eight-membered ring. The antagonistic effect of the secreted glycosylated protein encoded by this gene is likely due to its direct binding to BMP proteins. As an antagonist of BMP, this gene may play a role in regulating organogenesis, body patterning, and tissue differentiation. In mouse, this protein has been shown to relay the sonic hedgehog (SHH) signal from the polarizing region to the apical ectodermal ridge during limb bud outgrowth. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. gremlin 1, DAN family BMP antagonist NA
ENSG00000174807 CD248 57124 NA CD248 molecule NA
ENSG00000111341 MGP 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. matrix Gla protein NA
ENSG00000137801 THBS1 7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. thrombospondin 1 NA
ENSG00000141753 IGFBP4 3487 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. insulin like growth factor binding protein 4 NA
ENSG00000138207 RBP4 5950 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. retinol binding protein 4 NA
ENSG00000167676 PLIN4 729359 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). perilipin 4 NA
ENSG00000096696 DSP 1832 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. desmoplakin NA
ENSG00000123384 LRP1 4035 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. LDL receptor related protein 1 NA
ENSG00000125730 C3 718 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. complement component 3 NA
ENSG00000171476 HOPX 84525 The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOP homeobox NA
ENSG00000011465 DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin NA
ENSG00000119927 GPAM 57678 This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. glycerol-3-phosphate acyltransferase, mitochondrial NA
ENSG00000256545 NA NA NA NA TRUE
ENSG00000108821 COL1A1 1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 1 NA
ENSG00000123358 NR4A1 3164 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. nuclear receptor subfamily 4 group A member 1 NA
ENSG00000186847 KRT14 3861 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 NA
ENSG00000076555 ACACB 32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta NA
ENSG00000167642 SPINT2 10653 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. serine peptidase inhibitor, Kunitz type, 2 NA
ENSG00000163430 FSTL1 11167 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. follistatin like 1 NA
ENSG00000156508 EEF1A1 1915 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. eukaryotic translation elongation factor 1 alpha 1 NA
ENSG00000120885 CLU 1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. clusterin NA
ENSG00000142156 COL6A1 1291 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. collagen type VI alpha 1 NA
ENSG00000182326 C1S 716 This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. complement component 1, s subcomponent NA
ENSG00000117289 NA NA NA NA TRUE
ENSG00000087245 MMP2 4313 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. matrix metallopeptidase 2 NA
ENSG00000079435 LIPE 3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. lipase E, hormone sensitive type NA
ENSG00000173801 JUP 3728 This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. junction plakoglobin NA
ENSG00000171401 KRT13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 NA
ENSG00000174437 ATP2A2 488 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 NA
ENSG00000173641 HSPB7 27129 NA heat shock protein family B (small) member 7 NA
ENSG00000091704 CPA1 1357 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. carboxypeptidase A1 NA
ENSG00000107796 ACTA2 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta NA
ENSG00000163513 TGFBR2 7048 This gene encodes a member of the Ser/Thr protein kinase family and the TGFB receptor subfamily. The encoded protein is a transmembrane protein that has a protein kinase domain, forms a heterodimeric complex with another receptor protein, and binds TGF-beta. This receptor/ligand complex phosphorylates proteins, which then enter the nucleus and regulate the transcription of a subset of genes related to cell proliferation. Mutations in this gene have been associated with Marfan Syndrome, Loeys-Deitz Aortic Aneurysm Syndrome, and the development of various types of tumors. Alternatively spliced transcript variants encoding different isoforms have been characterized. transforming growth factor beta receptor 2 NA
ENSG00000135821 GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase NA
ENSG00000198363 ASPH 444 This gene is thought to play an important role in calcium homeostasis. The gene is expressed from two promoters and undergoes extensive alternative splicing. The encoded set of proteins share varying amounts of overlap near their N-termini but have substantial variations in their C-terminal domains resulting in distinct functional properties. The longest isoforms (a and f) include a C-terminal Aspartyl/Asparaginyl beta-hydroxylase domain that hydroxylates aspartic acid or asparagine residues in the epidermal growth factor (EGF)-like domains of some proteins, including protein C, coagulation factors VII, IX, and X, and the complement factors C1R and C1S. Other isoforms differ primarily in the C-terminal sequence and lack the hydroxylase domain, and some have been localized to the endoplasmic and sarcoplasmic reticulum. Some of these isoforms are found in complexes with calsequestrin, triadin, and the ryanodine receptor, and have been shown to regulate calcium release from the sarcoplasmic reticulum. Some isoforms have been implicated in metastasis. aspartate beta-hydroxylase NA
ENSG00000143387 CTSK 1513 The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. cathepsin K NA
ENSG00000204983 PRSS1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. protease, serine 1 NA
ENSG00000008394 MGST1 4257 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. microsomal glutathione S-transferase 1 NA
ENSG00000167768 KRT1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 NA
ENSG00000146674 IGFBP3 3486 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. insulin like growth factor binding protein 3 NA
ENSG00000135218 CD36 948 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. CD36 molecule NA
ENSG00000148926 ADM 133 The protein encoded by this gene is a preprohormone which is cleaved to form two biologically active peptides, adrenomedullin and proadrenomedullin N-terminal 20 peptide. Adrenomedullin is a 52 aa peptide with several functions, including vasodilation, regulation of hormone secretion, promotion of angiogenesis, and antimicrobial activity. The antimicrobial activity is antibacterial, as the peptide has been shown to kill E. coli and S. aureus at low concentration. adrenomedullin NA
ENSG00000186395 KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 NA
ENSG00000077943 ITGA8 8516 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. integrin subunit alpha 8 NA
ENSG00000124253 PCK1 5105 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. phosphoenolpyruvate carboxykinase 1 NA
ENSG00000134352 IL6ST 3572 The protein encoded by this gene is a signal transducer shared by many cytokines, including interleukin 6 (IL6), ciliary neurotrophic factor (CNTF), leukemia inhibitory factor (LIF), and oncostatin M (OSM). This protein functions as a part of the cytokine receptor complex. The activation of this protein is dependent upon the binding of cytokines to their receptors. vIL6, a protein related to IL6 and encoded by the Kaposi sarcoma-associated herpesvirus, can bypass the interleukin 6 receptor (IL6R) and directly activate this protein. Knockout studies in mice suggest that this gene plays a critical role in regulating myocyte apoptosis. Alternatively spliced transcript variants have been described. A related pseudogene has been identified on chromosome 17. interleukin 6 signal transducer NA
ENSG00000122786 CALD1 800 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. caldesmon 1 NA
ENSG00000111640 GAPDH 2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase NA
ENSG00000168878 SFTPB 6439 This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. surfactant protein B NA
ENSG00000120708 TGFBI 7045 This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. transforming growth factor beta induced NA
ENSG00000173432 SAA1 6288 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. serum amyloid A1 NA
ENSG00000175535 PNLIP 5406 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. pancreatic lipase NA
ENSG00000162733 DDR2 4921 Receptor tyrosine kinases (RTKs) play a key role in the communication of cells with their microenvironment. These molecules are involved in the regulation of cell growth, differentiation, and metabolism. In several cases the biochemical mechanism by which RTKs transduce signals across the membrane has been shown to be ligand induced receptor oligomerization and subsequent intracellular phosphorylation. This autophosphorylation leads to phosphorylation of cytosolic targets as well as association with other molecules, which are involved in pleiotropic effects of signal transduction. RTKs have a tripartite structure with extracellular, transmembrane, and cytoplasmic regions. This gene encodes a member of a novel subclass of RTKs and contains a distinct extracellular region encompassing a factor VIII-like domain. Alternative splicing in the 5’ UTR results in multiple transcript variants encoding the same protein. discoidin domain receptor tyrosine kinase 2 NA
ENSG00000196205 EEF1A1P5 ENSG00000196205 NA eukaryotic translation elongation factor 1 alpha 1 pseudogene 5 NA
ENSG00000069702 TGFBR3 7049 This locus encodes the transforming growth factor (TGF)-beta type III receptor. The encoded receptor is a membrane proteoglycan that often functions as a co-receptor with other TGF-beta receptor superfamily members. Ectodomain shedding produces soluble TGFBR3, which may inhibit TGFB signaling. Decreased expression of this receptor has been observed in various cancers. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. transforming growth factor beta receptor 3 NA
ENSG00000189058 APOD 347 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. apolipoprotein D NA
ENSG00000234745 HLA-B 3106 HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. major histocompatibility complex, class I, B NA
ENSG00000153071 DAB2 1601 This gene encodes a mitogen-responsive phosphoprotein. It is expressed in normal ovarian epithelial cells, but is down-regulated or absent from ovarian carcinoma cell lines, suggesting its role as a tumor suppressor. This protein binds to the SH3 domains of GRB2, an adaptor protein that couples tyrosine kinase receptors to SOS (a guanine nucleotide exchange factor for Ras), via its C-terminal proline-rich sequences, and may thus modulate growth factor/Ras pathways by competing with SOS for binding to GRB2. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. DAB2, clathrin adaptor protein NA
ENSG00000134013 LOXL2 4017 This gene encodes a member of the lysyl oxidase gene family. The prototypic member of the family is essential to the biogenesis of connective tissue, encoding an extracellular copper-dependent amine oxidase that catalyses the first step in the formation of crosslinks in collagens and elastin. A highly conserved amino acid sequence at the C-terminus end appears to be sufficient for amine oxidase activity, suggesting that each family member may retain this function. The N-terminus is poorly conserved and may impart additional roles in developmental regulation, senescence, tumor suppression, cell growth control, and chemotaxis to each member of the family. lysyl oxidase like 2 NA
ENSG00000153002 CPB1 1360 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. carboxypeptidase B1 NA
ENSG00000024422 EHD2 30846 This gene encodes a member of the EH domain-containing protein family. These proteins are characterized by a C-terminal EF-hand domain, a nucleotide-binding consensus site at the N terminus and a bipartite nuclear localization signal. The encoded protein interacts with the actin cytoskeleton through an N-terminal domain and also binds to an EH domain-binding protein through the C-terminal EH domain. This interaction appears to connect clathrin-dependent endocytosis to actin, suggesting that this gene product participates in the endocytic pathway. EH domain containing 2 NA
ENSG00000175084 DES 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin NA
ENSG00000166741 NNMT 4837 N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. nicotinamide N-methyltransferase NA
ENSG00000166147 FBN1 2200 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. fibrillin 1 NA
ENSG00000169347 GP2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. glycoprotein 2 NA
ENSG00000115884 SDC1 6382 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. syndecan 1 NA
ENSG00000159403 C1R 715 NA complement C1r subcomponent NA
ENSG00000122729 ACO1 48 The protein encoded by this gene is a bifunctional, cytosolic protein that functions as an essential enzyme in the TCA cycle and interacts with mRNA to control the levels of iron inside cells. When cellular iron levels are high, this protein binds to a 4Fe-4S cluster and functions as an aconitase. Aconitases are iron-sulfur proteins that function to catalyze the conversion of citrate to isocitrate. When cellular iron levels are low, the protein binds to iron-responsive elements (IREs), which are stem-loop structures found in the 5’ UTR of ferritin mRNA, and in the 3’ UTR of transferrin receptor mRNA. When the protein binds to IRE, it results in repression of translation of ferritin mRNA, and inhibition of degradation of the otherwise rapidly degraded transferrin receptor mRNA. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alternative splicing results in multiple transcript variants aconitase 1 NA
ENSG00000136960 ENPP2 5168 The protein encoded by this gene functions as both a phosphodiesterase, which cleaves phosphodiester bonds at the 5’ end of oligonucleotides, and a phospholipase, which catalyzes production of lysophosphatidic acid (LPA) in extracellular fluids. LPA evokes growth factor-like responses including stimulation of cell proliferation and chemotaxis. This gene product stimulates the motility of tumor cells and has angiogenic properties, and its expression is upregulated in several kinds of carcinomas. The gene product is secreted and further processed to make the biologically active form. Several alternatively spliced transcript variants encoding different isoforms have been identified. ectonucleotide pyrophosphatase/phosphodiesterase 2 NA
ENSG00000152377 SPOCK1 6695 This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 NA
ENSG00000211896 IGHG1 ENSG00000211896 NA immunoglobulin heavy constant gamma 1 (G1m marker) NA
ENSG00000161249 DMKN 93099 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. dermokine NA
ENSG00000167658 EEF2 1938 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation. eukaryotic translation elongation factor 2 NA
ENSG00000157150 TIMP4 7079 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. The secreted, netrin domain-containing protein encoded by this gene is involved in regulation of platelet aggregation and recruitment and may play role in hormonal regulation and endometrial tissue remodeling. TIMP metallopeptidase inhibitor 4 NA
ENSG00000164733 CTSB 1508 This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. cathepsin B NA
ENSG00000189184 PCDH18 54510 This gene belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. This gene encodes a protein which contains 6 extracellular cadherin domains, a transmembrane domain and a cytoplasmic tail differing from those of the classical cadherins. Although its specific function is undetermined, the cadherin-related neuronal receptor is thought to play a role in the establishment and function of specific cell-cell connections in the brain. protocadherin 18 NA
ENSG00000142789 CELA3A 10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. chymotrypsin like elastase family member 3A NA
ENSG00000138758 SEPT11 55752 SEPT11 belongs to the conserved septin family of filament-forming cytoskeletal GTPases that are involved in a variety of cellular functions including cytokinesis and vesicle trafficking (Hanai et al., 2004 [PubMed 15196925]; Nagata et al., 2004 [PubMed 15485874]). septin 11 NA
ENSG00000065534 MYLK 4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. myosin light chain kinase NA
ENSG00000009413 REV3L 5980 NA REV3 like, DNA directed polymerase zeta catalytic subunit NA
ENSG00000170477 KRT4 3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 NA
ENSG00000116285 ERRFI1 54206 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). ERBB receptor feedback inhibitor 1 NA
ENSG00000122304 PRM2 5620 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. protamine 2 NA
ENSG00000170835 CEL 1056 The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. carboxyl ester lipase NA
ENSG00000091986 CCDC80 151887 NA coiled-coil domain containing 80 NA
ENSG00000187288 CIDEC 63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. cell death inducing DFFA like effector c NA
ENSG00000169692 AGPAT2 10555 This gene encodes a member of the 1-acylglycerol-3-phosphate O-acyltransferase family. The protein is located within the endoplasmic reticulum membrane and converts lysophosphatidic acid to phosphatidic acid, the second step in de novo phospholipid biosynthesis. Mutations in this gene have been associated with congenital generalized lipodystrophy (CGL), or Berardinelli-Seip syndrome, a disease characterized by a near absence of adipose tissue and severe insulin resistance. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 1-acylglycerol-3-phosphate O-acyltransferase 2 NA
ENSG00000104879 CKM 1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type NA
ENSG00000151726 ACSL1 2180 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. acyl-CoA synthetase long-chain family member 1 NA
ENSG00000140988 RPS2 6187 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S5P family of ribosomal proteins. It is located in the cytoplasm. This gene shares sequence similarity with mouse LLRep3. It is co-transcribed with the small nucleolar RNA gene U64, which is located in its third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein S2 NA
ENSG00000197766 CFD 1675 This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. complement factor D NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol
ENSG00000132639 6616 synaptosome associated protein 25 Synaptic vesicle membrane docking and fusion is mediated by SNAREs (soluble N-ethylmaleimide-sensitive factor attachment protein receptors) located on the vesicle membrane (v-SNAREs) and the target membrane (t-SNAREs). The assembled v-SNARE/t-SNARE complex consists of a bundle of four helices, one of which is supplied by v-SNARE and the other three by t-SNARE. For t-SNAREs on the plasma membrane, the protein syntaxin supplies one helix and the protein encoded by this gene contributes the other two. Therefore, this gene product is a presynaptic plasma membrane protein involved in the regulation of neurotransmitter release. Two alternative transcript variants encoding different protein isoforms have been described for this gene. SNAP25
ENSG00000244734 3043 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB
ENSG00000198668 801 calmodulin 1 (phosphorylase kinase, delta) This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. CALM1
ENSG00000198668 805 calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. CALM2
ENSG00000108821 1277 collagen type I alpha 1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A1
ENSG00000101210 1917 eukaryotic translation elongation factor 1 alpha 2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. EEF1A2
ENSG00000104833 10382 tubulin beta 4A class IVa This gene encodes a member of the beta tubulin family. Beta tubulins are one of two core protein families (alpha and beta tubulins) that heterodimerize and assemble to form microtubules. Mutations in this gene cause hypomyelinating leukodystrophy-6 and autosomal dominant torsion dystonia-4. Alternate splicing results in multiple transcript variants encoding different isoforms. A pseudogene of this gene is found on chromosome X. TUBB4A
ENSG00000155980 3798 kinesin family member 5A This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. KIF5A
ENSG00000018625 477 ATPase Na+/K+ transporting subunit alpha 2 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ATP1A2
ENSG00000103034 65009 NDRG family member 4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. NDRG4
ENSG00000166710 567 beta-2-microglobulin This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. B2M
ENSG00000058404 816 calcium/calmodulin dependent protein kinase II beta The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. CAMK2B
ENSG00000166165 1152 creatine kinase B The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. CKB
ENSG00000104888 57030 solute carrier family 17 member 7 The protein encoded by this gene is a vesicle-bound, sodium-dependent phosphate transporter that is specifically expressed in the neuron-rich regions of the brain. It is preferentially associated with the membranes of synaptic vesicles and functions in glutamate transport. The protein shares 82% identity with the differentiation-associated Na-dependent inorganic phosphate cotransporter and they appear to form a distinct class within the Na+/Pi cotransporter family. SLC17A7
ENSG00000109107 230 aldolase, fructose-bisphosphate C This gene encodes a member of the class I fructose-biphosphate aldolase gene family. Expressed specifically in the hippocampus and Purkinje cells of the brain, the encoded protein is a glycolytic enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-biphosphate and fructose 1-phosphate to dihydroxyacetone phosphate and either glyceraldehyde-3-phosphate or glyceraldehyde, respectively. ALDOC
ENSG00000142173 1292 collagen type VI alpha 2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. COL6A2
ENSG00000163032 7447 visinin like 1 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. VSNL1
ENSG00000168542 1281 collagen type III alpha 1 chain This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL3A1
ENSG00000188536 3040 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA2
ENSG00000139970 6252 reticulon 1 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. RTN1
ENSG00000168490 9796 phytanoyl-CoA 2-hydroxylase interacting protein NA PHYHIP
ENSG00000165795 57447 NDRG family member 2 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG2
ENSG00000168280 3800 kinesin family member 5C The protein encoded by this gene is a kinesin heavy chain subunit involved in the transport of cargo within the central nervous system. The encoded protein, which acts as a tetramer by associating with another heavy chain and two light chains, interacts with protein kinase CK2. Mutations in this gene have been associated with complex cortical dysplasia with other brain malformations-2. Two transcript variants, one protein-coding and the other non-protein coding, have been found for this gene. KIF5C
ENSG00000079215 6507 solute carrier family 1 member 3 This gene encodes a member of a member of a high affinity glutamate transporter family. This gene functions in the termination of excitatory neurotransmission in central nervous system. Mutations are associated with episodic ataxia, Type 6. Alternative splicing results in multiple transcript variants. SLC1A3
ENSG00000089199 1114 chromogranin B This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. CHGB
ENSG00000166963 4130 microtubule associated protein 1A This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. MAP1A
ENSG00000131095 2670 glial fibrillary acidic protein This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. GFAP
ENSG00000136854 6812 syntaxin binding protein 1 This gene encodes a syntaxin-binding protein. The encoded protein appears to play a role in release of neurotransmitters via regulation of syntaxin, a transmembrane attachment protein receptor. Mutations in this gene have been associated with infantile epileptic encephalopathy-4. Alternatively spliced transcript variants have been described. STXBP1
ENSG00000106976 1759 dynamin 1 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. DNM1
ENSG00000127585 146330 F-box and leucine rich repeat protein 16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). FBXL16
ENSG00000145362 287 ankyrin 2, neuronal This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. ANK2
ENSG00000104435 11075 stathmin 2 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. STMN2
ENSG00000168309 11170 family with sequence similarity 107 member A NA FAM107A
ENSG00000100321 9145 synaptogyrin 1 This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. SYNGR1
ENSG00000143847 8497 PTPRF interacting protein alpha 4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PPFIA4
ENSG00000100345 4627 myosin, heavy chain 9, non-muscle This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. MYH9
ENSG00000074317 6620 synuclein beta This gene encodes a member of a small family of proteins that inhibit phospholipase D2 and may function in neuronal plasticity. The encoded protein is abundant in lesions of patients with Alzheimer disease. A mutation in this gene was found in individuals with dementia with Lewy bodies. Alternative splicing results in multiple transcript variants. SNCB
ENSG00000160014 808 calmodulin 3 (phosphorylase kinase, delta) NA CALM3
ENSG00000160014 805 calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. CALM2
ENSG00000105696 25789 transmembrane protein 59 like This gene encodes a predicted type-I membrane glycoprotein. The encoded protein may play a role in functioning of the central nervous system. TMEM59L
ENSG00000135709 9764 KIAA0513 NA KIAA0513
ENSG00000099365 112755 syntaxin 1B The protein encoded by this gene belongs to a family of proteins thought to play a role in the exocytosis of synaptic vesicles. Vesicle exocytosis releases vesicular contents and is important to various cellular functions. For instance, the secretion of transmitters from neurons plays an important role in synaptic transmission. After exocytosis, the membrane and proteins from the vesicle are retrieved from the plasma membrane through the process of endocytosis. Mutations in this gene have been identified as one cause of fever-associated epilepsy syndromes. A possible link between this gene and Parkinson’s disease has also been suggested. STX1B
ENSG00000063180 770 carbonic anhydrase 11 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system. CA11
ENSG00000107796 59 actin, alpha 2, smooth muscle, aorta The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2
ENSG00000137801 7057 thrombospondin 1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. THBS1
ENSG00000109472 1363 carboxypeptidase E This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. CPE
ENSG00000088899 9762 leucine zipper, putative tumor suppressor family member 3 NA LZTS3
ENSG00000198794 192683 secretory carrier membrane protein 5 NA SCAMP5
ENSG00000129244 482 ATPase Na+/K+ transporting subunit beta 2 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. ATP1B2
ENSG00000204592 3133 major histocompatibility complex, class I, E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. HLA-E
ENSG00000111674 2026 enolase 2 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme, a homodimer, is found in mature neurons and cells of neuronal origin. A switch from alpha enolase to gamma enolase occurs in neural tissue during development in rats and primates. ENO2
ENSG00000008735 23542 mitogen-activated protein kinase 8 interacting protein 2 The protein encoded by this gene is closely related to MAPK8IP1/IB1/JIP-1, a scaffold protein that is involved in the c-Jun amino-terminal kinase signaling pathway. This protein is expressed in brain and pancreatic cells. It has been shown to interact with, and regulate the activity of MAPK8/JNK1, and MAP2K7/MKK7 kinases. This protein thus is thought to function as a regulator of signal transduction by protein kinase cascade in brain and pancreatic beta-cells. MAPK8IP2
ENSG00000011465 1634 decorin This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. DCN
ENSG00000087245 4313 matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. MMP2
ENSG00000182718 302 annexin A2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. ANXA2
ENSG00000077942 2192 fibulin 1 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. FBLN1
ENSG00000019582 972 CD74 molecule The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. This protein also interacts with amyloid precursor protein (APP) and suppresses the production of amyloid beta (Abeta). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. CD74
ENSG00000054523 23095 kinesin family member 1B This gene encodes a motor protein that transports mitochondria and synaptic vesicle precursors. Mutations in this gene cause Charcot-Marie-Tooth disease, type 2A1. KIF1B
ENSG00000110076 9379 neurexin 2 This gene encodes a member of the neurexin gene family. The products of these genes function as cell adhesion molecules and receptors in the vertebrate nervous system. These genes utilize two promoters. The majority of transcripts are produced from the upstream promoter and encode alpha-neurexin isoforms while a smaller number of transcripts are produced from the downstream promoter and encode beta-neuresin isoforms. The alpha-neurexins contain epidermal growth factor-like (EGF-like) sequences and laminin G domains, and have been shown to interact with neurexophilins. The beta-neurexins lack EGF-like sequences and contain fewer laminin G domains than alpha-neurexins. Alternative splicing and the use of alternative promoters may generate thousands of transcript variants (PMID: 12036300, PMID: 11944992). NRXN2
ENSG00000107130 23413 neuronal calcium sensor 1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. NCS1
ENSG00000179456 10472 zinc finger and BTB domain containing 18 This gene encodes a C2H2-type zinc finger protein which acts a transcriptional repressor of genes involved in neuronal development. The encoded protein recognizes a specific sequence motif and recruits components of chromatin to target genes. Alternative splicing results in multiple transcript variants. ZBTB18
ENSG00000124942 79026 AHNAK nucleoprotein NA AHNAK
ENSG00000156011 23362 pleckstrin and Sec7 domain containing 3 NA PSD3
ENSG00000008710 5310 polycystin 1, transient receptor potential channel interacting This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. PKD1
ENSG00000026025 7431 vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM
ENSG00000106624 165 AE binding protein 1 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AEBP1
ENSG00000171130 155066 ATPase H+ transporting V0 subunit e2 Multisubunit vacuolar-type proton pumps, or H(+)-ATPases, acidify various intracellular compartments, such as vacuoles, clathrin-coated and synaptic vesicles, endosomes, lysosomes, and chromaffin granules. H(+)-ATPases are also found in plasma membranes of specialized cells, where they play roles in urinary acidification, bone resorption, and sperm maturation. Multiple subunits form H(+)-ATPases, with proteins of the V1 class hydrolyzing ATP for energy to transport H+, and proteins of the V0 class forming an integral membrane domain through which H+ is transported. ATP6V0E2 encodes an isoform of the H(+)-ATPase V0 e subunit, an essential proton pump component (Blake-Palmer et al., 2007 [PubMed 17350184]). ATP6V0E2
ENSG00000184524 51286 cell cycle exit and neuronal differentiation 1 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. CEND1
ENSG00000170027 7532 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the rat ortholog. It is induced by growth factors in human vascular smooth muscle cells, and is also highly expressed in skeletal and heart muscles, suggesting an important role for this protein in muscle tissue. It has been shown to interact with RAF1 and protein kinase C, proteins involved in various signal transduction pathways. YWHAG
ENSG00000133318 10313 reticulon 3 This gene belongs to the reticulon family of highly conserved genes that are preferentially expressed in neuroendocrine tissues. This family of proteins interact with, and modulate the activity of beta-amyloid converting enzyme 1 (BACE1), and the production of amyloid-beta. An increase in the expression of any reticulon protein substantially reduces the production of amyloid-beta, suggesting that reticulon proteins are negative modulators of BACE1 in cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene, and pseudogenes of this gene are located on chromosomes 4 and 12. RTN3
ENSG00000125814 63908 NSF attachment protein beta NA NAPB
ENSG00000131711 4131 microtubule associated protein 1B This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1B heavy chain and LC1 light chain. Gene knockout studies of the mouse microtubule-associated protein 1B gene suggested an important role in development and function of the nervous system. MAP1B
ENSG00000100505 114088 tripartite motif containing 9 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic bodies. Its function has not been identified. Alternate splicing of this gene generates two transcript variants encoding different isoforms. TRIM9
ENSG00000132535 1742 discs large MAGUK scaffold protein 4 This gene encodes a member of the membrane-associated guanylate kinase (MAGUK) family. It heteromultimerizes with another MAGUK protein, DLG2, and is recruited into NMDA receptor and potassium channel clusters. These two MAGUK proteins may interact at postsynaptic sites to form a multimeric scaffold for the clustering of receptors, ion channels, and associated signaling proteins. Multiple transcript variants encoding different isoforms have been found for this gene. DLG4
ENSG00000007237 8522 growth arrest specific 7 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. GAS7
ENSG00000247556 ENSG00000247556 OIP5 antisense RNA 1 NA OIP5-AS1
ENSG00000154277 7345 ubiquitin C-terminal hydrolase L1 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. UCHL1
ENSG00000107742 9806 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. SPOCK2
ENSG00000163359 1293 collagen type VI alpha 3 chain This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. COL6A3
ENSG00000206172 3039 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA1
ENSG00000104964 166 amino-terminal enhancer of split The protein encoded by this gene is similar in sequence to the amino terminus of Drosophila enhancer of split groucho, a protein involved in neurogenesis during embryonic development. The encoded protein, which belongs to the groucho/TLE family of proteins, can function as a homooligomer or as a heteroologimer with other family members to dominantly repress the expression of other family member genes. Three transcript variants encoding different isoforms have been found for this gene. AES
ENSG00000156508 1915 eukaryotic translation elongation factor 1 alpha 1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. EEF1A1
ENSG00000023171 57476 GRAM domain containing 1B NA GRAMD1B
ENSG00000073670 4185 ADAM metallopeptidase domain 11 This gene encodes a member of the ADAM (a disintegrin and metalloprotease) protein family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The encoded preproprotein is proteolytically processed to generate the mature protease. This gene represents a candidate tumor suppressor gene for human breast cancer based on its location within a minimal region of chromosome 17q21 previously defined by tumor deletion mapping. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ADAM11
ENSG00000087250 4504 metallothionein 3 NA MT3
ENSG00000221890 23467 neuronal pentraxin receptor This gene encodes a protein similar to the rat neuronal pentraxin receptor. The rat pentraxin receptor is an integral membrane protein that is thought to mediate neuronal uptake of the snake venom toxin, taipoxin, and its transport into the synapses. Studies in rat indicate that translation of this mRNA initiates at a non-AUG (CUG) codon. This may also be true for mouse and human, based on strong sequence conservation amongst these species. NPTXR
ENSG00000092096 51310 solute carrier family 22 member 17 NA SLC22A17
ENSG00000111341 4256 matrix Gla protein The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. MGP
ENSG00000105270 25999 CAP-Gly domain containing linker protein 3 This gene encodes a member of the cytoplasmic linker protein 170 family. Members of this protein family contain a cytoskeleton-associated protein glycine-rich domain and mediate the interaction of microtubules with cellular organelles. The encoded protein plays a role in T cell apoptosis by facilitating the association of tubulin and the lipid raft ganglioside GD3. The encoded protein also functions as a scaffold protein mediating membrane localization of phosphorylated protein kinase B. Alternatively spliced transcript variants have been observed for this gene. CLIP3
ENSG00000159164 9900 synaptic vesicle glycoprotein 2A NA SV2A
ENSG00000225630 ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA MTND2P28
ENSG00000105711 6324 sodium voltage-gated channel beta subunit 1 Voltage-gated sodium channels are heteromeric proteins that function in the generation and propagation of action potentials in muscle and neuronal cells. They are composed of one alpha and two beta subunits, where the alpha subunit provides channel activity and the beta-1 subunit modulates the kinetics of channel inactivation. This gene encodes a sodium channel beta-1 subunit. Mutations in this gene result in generalized epilepsy with febrile seizures plus, Brugada syndrome 5, and defects in cardiac conduction. Multiple transcript variants encoding different isoforms have been found for this gene. SCN1B
ENSG00000120708 7045 transforming growth factor beta induced This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. TGFBI
ENSG00000089220 5037 phosphatidylethanolamine binding protein 1 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. PEBP1
ENSG00000117016 9783 regulating synaptic membrane exocytosis 3 NA RIMS3
ENSG00000197457 50861 stathmin 3 This gene encodes a protein which is a member of the stathmin protein family. Members of this protein family form a complex with tubulins at a ratio of 2 tubulins for each stathmin protein. Microtubules require the ordered assembly of alpha- and beta-tubulins, and formation of a complex with stathmin disrupts microtubule formation and function. A pseudogene of this gene is located on chromosome 22. Alternative splicing results in multiple transcript variants. STMN3
ENSG00000160460 57731 spectrin beta, non-erythrocytic 4 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein localizes to the nuclear matrix, PML nuclear bodies, and cytoplasmic vesicles. A highly similar gene in the mouse is required for localization of specific membrane proteins in polarized regions of neurons. Multiple transcript variants encoding different isoforms have been found for this gene. SPTBN4
ENSG00000237973 ENSG00000237973 MT-CO1 pseudogene 12 NA MTCO1P12
ENSG00000075624 60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB
ENSG00000141753 3487 insulin like growth factor binding protein 4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. IGFBP4
ENSG00000135439 116986 ArfGAP with GTPase domain, ankyrin repeat and PH domain 2 The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. AGAP2
ENSG00000090006 8425 latent transforming growth factor beta binding protein 4 The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. LTBP4
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name
HBB 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. ENSG00000244734 hemoglobin subunit beta
HBA1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000206172 hemoglobin subunit alpha 1
COL1A1 1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1
SPTBN1 6711 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000115306 spectrin beta, non-erythrocytic 1
S100A9 6280 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. ENSG00000163220 S100 calcium binding protein A9
KRT1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1
SPARCL1 8404 NA ENSG00000152583 SPARC like 1
FN1 2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1
HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000188536 hemoglobin subunit alpha 2
SPARC 6678 This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000113140 secreted protein acidic and cysteine rich
CD81 975 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. This protein appears to promote muscle cell fusion and support myotube maintenance. Also it may be involved in signal transduction. This gene is localized in the tumor-suppressor gene region and thus it is a candidate gene for malignancies. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000110651 CD81 molecule
MBP 4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. ENSG00000197971 myelin basic protein
FSTL1 11167 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. ENSG00000163430 follistatin like 1
EPAS1 2034 This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. ENSG00000116016 endothelial PAS domain protein 1
COL1A2 1278 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000164692 collagen type I alpha 2 chain
LRP1 4035 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. ENSG00000123384 LDL receptor related protein 1
A2M 2 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. ENSG00000175899 alpha-2-macroglobulin
CTGF 1490 The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. ENSG00000118523 connective tissue growth factor
KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. ENSG00000186395 keratin 10
SLC25A39 51629 This gene encodes a member of the SLC25 transporter or mitochondrial carrier family of proteins. Members of this family are encoded by the nuclear genome while their protein products are usually embedded in the inner mitochondrial membrane and exhibit wide-ranging substrate specificity. Although the encoded protein is currently considered an orphan transporter, this protein is related to other carriers known to transport amino acids. This protein may play a role in iron homeostasis. ENSG00000013306 solute carrier family 25 member 39
CSRP1 1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. ENSG00000159176 cysteine and glycine rich protein 1
APP 351 This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene. ENSG00000142192 amyloid beta precursor protein
MTND2P28 ENSG00000225630 NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28
EPB41L2 2037 NA ENSG00000079819 erythrocyte membrane protein band 4.1 like 2
YBX3 8531 NA ENSG00000060138 Y-box binding protein 3
CNN3 1266 This gene encodes a protein with a markedly acidic C terminus; the basic N-terminus is highly homologous to the N-terminus of a related gene, CNN1. Members of the CNN gene family all contain similar tandemly repeated motifs. This encoded protein is associated with the cytoskeleton but is not involved in contraction. ENSG00000117519 calponin 3
FTL 2512 This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ENSG00000087086 ferritin, light polypeptide
PDGFRB 5159 This gene encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. These growth factors are mitogens for cells of mesenchymal origin. The identity of the growth factor bound to a receptor monomer determines whether the functional receptor is a homodimer or a heterodimer, composed of both platelet-derived growth factor receptor alpha and beta polypeptides. This gene is flanked on chromosome 5 by the genes for granulocyte-macrophage colony-stimulating factor and macrophage-colony stimulating factor receptor; all three genes may be implicated in the 5-q syndrome. A translocation between chromosomes 5 and 12, that fuses this gene to that of the translocation, ETV6, leukemia gene, results in chronic myeloproliferative disorder with eosinophilia. ENSG00000113721 platelet derived growth factor receptor beta
LGALS3BP 3959 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. LGALS3BP has been found elevated in the serum of patients with cancer and in those infected by the human immunodeficiency virus (HIV). It appears to be implicated in immune response associated with natural killer (NK) and lymphokine-activated killer (LAK) cell cytotoxicity. Using fluorescence in situ hybridization the full length 90K cDNA has been localized to chromosome 17q25. The native protein binds specifically to a human macrophage-associated lectin known as Mac-2 and also binds galectin 1. ENSG00000108679 galectin 3 binding protein
CALM2 805 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000143933 calmodulin 2 (phosphorylase kinase, delta)
KIAA0930 23313 NA ENSG00000100364 KIAA0930
CKB 1152 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. ENSG00000166165 creatine kinase B
COL6A3 1293 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. ENSG00000163359 collagen type VI alpha 3 chain
TG 7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. ENSG00000042832 thyroglobulin
CST3 1471 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. The cystatin locus on chromosome 20 contains the majority of the type 2 cystatin genes and pseudogenes. This gene is located in the cystatin locus and encodes the most abundant extracellular inhibitor of cysteine proteases, which is found in high concentrations in biological fluids and is expressed in virtually all organs of the body. A mutation in this gene has been associated with amyloid angiopathy. Expression of this protein in vascular wall smooth muscle cells is severely reduced in both atherosclerotic and aneurysmal aortic lesions, establishing its role in vascular disease. In addition, this protein has been shown to have an antimicrobial function, inhibiting the replication of herpes simplex virus. Alternative splicing results in multiple transcript variants encoding a single protein. ENSG00000101439 cystatin C
PMP22 5376 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. ENSG00000109099 peripheral myelin protein 22
HSP90AA1 3320 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000080824 heat shock protein 90kDa alpha family class A member 1
HLA-B 3106 HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. ENSG00000234745 major histocompatibility complex, class I, B
ZCCHC24 219654 NA ENSG00000165424 zinc finger CCHC-type containing 24
LAMB1 3912 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. ENSG00000091136 laminin subunit beta 1
SEPT2 4735 NA ENSG00000168385 septin 2
TTC3 7267 NA ENSG00000182670 tetratricopeptide repeat domain 3
CCDC80 151887 NA ENSG00000091986 coiled-coil domain containing 80
UBA52 7311 Ubiquitin is a highly conserved nuclear and cytoplasmic protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene encodes a fusion protein consisting of ubiquitin at the N terminus and ribosomal protein L40 at the C terminus, a C-terminal extension protein (CEP). Multiple processed pseudogenes derived from this gene are present in the genome. ENSG00000221983 ubiquitin A-52 residue ribosomal protein fusion product 1
GAS7 8522 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. ENSG00000007237 growth arrest specific 7
LAPTM5 7805 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. ENSG00000162511 lysosomal protein transmembrane 5
ZEB2 9839 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. ENSG00000169554 zinc finger E-box binding homeobox 2
DST 667 This gene encodes a member of the plakin protein family of adhesion junction plaque proteins. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene, but the full-length nature of some variants has not been defined. It has been reported that some isoforms are expressed in neural and muscle tissue, anchoring neural intermediate filaments to the actin cytoskeleton, and some isoforms are expressed in epithelial tissue, anchoring keratin-containing intermediate filaments to hemidesmosomes. Consistent with the expression, mice defective for this gene show skin blistering and neurodegeneration. ENSG00000151914 dystonin
INF2 64423 This gene represents a member of the formin family of proteins. It is considered a diaphanous formin due to the presence of a diaphanous inhibitory domain located at the N-terminus of the encoded protein. Studies of a similar mouse protein indicate that the protein encoded by this locus may function in polymerization and depolymerization of actin filaments. Mutations at this locus have been associated with focal segmental glomerulosclerosis 5. ENSG00000203485 inverted formin, FH2 and WH2 domain containing
COL16A1 1307 This gene encodes the alpha chain of type XVI collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Members of this collagen family are found in association with fibril-forming collagens such as type I and II, and serve to maintain the integrity of the extracellular matrix. High levels of type XVI collagen have been found in fibroblasts and keratinocytes, and in smooth muscle and amnion. ENSG00000084636 collagen type XVI alpha 1 chain
CLU 1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. ENSG00000120885 clusterin
IL6ST 3572 The protein encoded by this gene is a signal transducer shared by many cytokines, including interleukin 6 (IL6), ciliary neurotrophic factor (CNTF), leukemia inhibitory factor (LIF), and oncostatin M (OSM). This protein functions as a part of the cytokine receptor complex. The activation of this protein is dependent upon the binding of cytokines to their receptors. vIL6, a protein related to IL6 and encoded by the Kaposi sarcoma-associated herpesvirus, can bypass the interleukin 6 receptor (IL6R) and directly activate this protein. Knockout studies in mice suggest that this gene plays a critical role in regulating myocyte apoptosis. Alternatively spliced transcript variants have been described. A related pseudogene has been identified on chromosome 17. ENSG00000134352 interleukin 6 signal transducer
RTN3 10313 This gene belongs to the reticulon family of highly conserved genes that are preferentially expressed in neuroendocrine tissues. This family of proteins interact with, and modulate the activity of beta-amyloid converting enzyme 1 (BACE1), and the production of amyloid-beta. An increase in the expression of any reticulon protein substantially reduces the production of amyloid-beta, suggesting that reticulon proteins are negative modulators of BACE1 in cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene, and pseudogenes of this gene are located on chromosomes 4 and 12. ENSG00000133318 reticulon 3
CANX 821 This gene encodes a member of the calnexin family of molecular chaperones. The encoded protein is a calcium-binding, endoplasmic reticulum (ER)-associated protein that interacts transiently with newly synthesized N-linked glycoproteins, facilitating protein folding and assembly. It may also play a central role in the quality control of protein folding by retaining incorrectly folded protein subunits within the ER for degradation. Alternatively spliced transcript variants encoding the same protein have been described. ENSG00000127022 calnexin
ALDH1A1 216 The protein encoded by this gene belongs to the aldehyde dehydrogenase family. Aldehyde dehydrogenase is the next enzyme after alcohol dehydrogenase in the major pathway of alcohol metabolism. There are two major aldehyde dehydrogenase isozymes in the liver, cytosolic and mitochondrial, which are encoded by distinct genes, and can be distinguished by their electrophoretic mobility, kinetic properties, and subcellular localization. This gene encodes the cytosolic isozyme. Studies in mice show that through its role in retinol metabolism, this gene may also be involved in the regulation of the metabolic responses to high-fat diet. ENSG00000165092 aldehyde dehydrogenase 1 family member A1
LUM 4060 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. ENSG00000139329 lumican
FXYD6 53826 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. ENSG00000137726 FXYD domain containing ion transport regulator 6
KRT13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. ENSG00000171401 keratin 13
RAB31 11031 Small GTP-binding proteins of the RAB family, such as RAB31, play essential roles in vesicle and granule targeting (Bao et al., 2002 [PubMed 11784320]). ENSG00000168461 RAB31, member RAS oncogene family
DDIT4 54541 NA ENSG00000168209 DNA damage inducible transcript 4
NDRG2 57447 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. ENSG00000165795 NDRG family member 2
MXRA8 54587 NA ENSG00000162576 matrix remodeling associated 8
FARP1 10160 This gene encodes a protein containing a FERM (4.2, exrin, radixin, moesin) domain, a Dbl homology domain, and two pleckstrin homology domains. These domains are found in guanine nucleotide exchange factors and proteins that link the cytoskeleton to the cell membrane. The encoded protein functions in neurons to promote dendritic growth. Alternative splicing results in multiple transcript variants. ENSG00000152767 FERM, ARH/RhoGEF and pleckstrin domain protein 1
HSP90B1 7184 This gene encodes a member of a family of adenosine triphosphate(ATP)-metabolizing molecular chaperones with roles in stabilizing and folding other proteins. The encoded protein is localized to melanosomes and the endoplasmic reticulum. Expression of this protein is associated with a variety of pathogenic states, including tumor formation. There is a microRNA gene located within the 5’ exon of this gene. There are pseudogenes for this gene on chromosomes 1 and 15. ENSG00000166598 heat shock protein 90kDa beta family member 1
KIAA1462 57608 NA ENSG00000165757 KIAA1462
COL5A1 1289 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. The encoded procollagen protein occurs commonly as the heterotrimer pro-alpha1(V)-pro-alpha1(V)-pro-alpha2(V). Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. Alternative splicing of this gene results in multiple transcript variants. ENSG00000130635 collagen type V alpha 1
MATR3 9782 This gene encodes a nuclear matrix protein, which is proposed to stabilize certain messenger RNA species. Mutations of this gene are associated with distal myopathy 2, which often includes vocal cord and pharyngeal weakness. Alternatively spliced transcript variants, including read-through transcripts composed of the upstream small nucleolar RNA host gene 4 (non-protein coding) and matrin 3 gene sequence, have been identified. Pseudogenes of this gene are located on chromosomes 1 and X. ENSG00000015479 matrin 3
AKAP12 9590 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein is expressed in endothelial cells, cultured fibroblasts, and osteosarcoma cells. It associates with protein kinases A and C and phosphatase, and serves as a scaffold protein in signal transduction. This protein and RII PKA colocalize at the cell periphery. This protein is a cell growth-related protein. Antibodies to this protein can be produced by patients with myasthenia gravis. Alternative splicing of this gene results in two transcript variants encoding different isoforms. ENSG00000131016 A-kinase anchoring protein 12
AHNAK 79026 NA ENSG00000124942 AHNAK nucleoprotein
COL3A1 1281 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000168542 collagen type III alpha 1 chain
MRC2 9902 This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. ENSG00000011028 mannose receptor C type 2
ITGA1 3672 This gene encodes the alpha 1 subunit of integrin receptors. This protein heterodimerizes with the beta 1 subunit to form a cell-surface receptor for collagen and laminin. The heterodimeric receptor is involved in cell-cell adhesion and may play a role in inflammation and fibrosis. The alpha 1 subunit contains an inserted (I) von Willebrand factor type I domain which is thought to be involved in collagen binding. ENSG00000213949 integrin subunit alpha 1
SLC25A37 51312 SLC25A37 is a solute carrier localized in the mitochondrial inner membrane. It functions as an essential iron importer for the synthesis of mitochondrial heme and iron-sulfur clusters (summary by Chen et al., 2009 [PubMed 19805291]). ENSG00000147454 solute carrier family 25 member 37
COL4A2 1284 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. ENSG00000134871 collagen type IV alpha 2
RBM38 55544 NA ENSG00000132819 RNA binding motif protein 38
KIAA0754 643314 NA ENSG00000127603 KIAA0754
MACF1 23499 This gene encodes a large protein containing numerous spectrin and leucine-rich repeat (LRR) domains. The encoded protein is a member of a family of proteins that form bridges between different cytoskeletal elements. This protein facilitates actin-microtubule interactions at the cell periphery and couples the microtubule network to cellular junctions. Alternative splicing results in multiple transcript variants, but the full-length nature of some of these variants has not been determined. ENSG00000127603 microtubule-actin crosslinking factor 1
TMBIM6 7009 NA ENSG00000139644 transmembrane BAX inhibitor motif containing 6
DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin
ANXA1 301 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ENSG00000135046 annexin A1
EFEMP1 2202 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. ENSG00000115380 EGF containing fibulin like extracellular matrix protein 1
COL4A1 1282 This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. ENSG00000187498 collagen type IV alpha 1 chain
ACTA1 58 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ENSG00000143632 actin, alpha 1, skeletal muscle
PTGDS 5730 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. ENSG00000107317 prostaglandin D2 synthase
PALM 5064 This gene encodes a member of the paralemmin protein family. The product of this gene is a prenylated and palmitoylated phosphoprotein that associates with the cytoplasmic face of plasma membranes and is implicated in plasma membrane dynamics in neurons and other cell types. Several alternatively spliced transcript variants have been identified, but the full-length nature of only two transcript variants has been determined. ENSG00000099864 paralemmin
ARHGEF10 9639 This gene encodes a Rho guanine nucleotide exchange factor (GEF). Rho GEFs regulate the activity of small Rho GTPases by stimulating the exchange of guanine diphosphate (GDP) for guanine triphosphate (GTP) and may play a role in neural morphogenesis. Mutations in this gene are associated with slowed nerve conduction velocity (SNCV). Alternative splicing results in multiple transcript variants. ENSG00000104728 Rho guanine nucleotide exchange factor 10
DPYSL3 1809 NA ENSG00000113657 dihydropyrimidinase like 3
CALU 813 The product of this gene is a calcium-binding protein localized in the endoplasmic reticulum (ER) and it is involved in such ER functions as protein folding and sorting. This protein belongs to a family of multiple EF-hand proteins (CERC) that include reticulocalbin, ERC-55, and Cab45 and the product of this gene. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000128595 calumenin
PLAT 5327 This gene encodes tissue-type plasminogen activator, a secreted serine protease that converts the proenzyme plasminogen to plasmin, a fibrinolytic enzyme. The encoded preproprotein is proteolytically processed by plasmin or trypsin to generate heavy and light chains. These chains associate via disulfide linkages to form the heterodimeric enzyme. This enzyme plays a role in cell migration and tissue remodeling. Increased enzymatic activity causes hyperfibrinolysis, which manifests as excessive bleeding, while decreased activity leads to hypofibrinolysis, which can result in thrombosis or embolism. Alternative splicing of this gene results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000104368 plasminogen activator, tissue type
NR2F2 7026 This gene encodes a member of the steroid thyroid hormone superfamily of nuclear receptors. The encoded protein is a ligand inducible transcription factor that is involved in the regulation of many different genes. Alternate splicing results in multiple transcript variants. ENSG00000185551 nuclear receptor subfamily 2 group F member 2
RASSF2 9770 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. ENSG00000101265 Ras association domain family member 2
PLPP3 8613 The protein encoded by this gene is a member of the phosphatidic acid phosphatase (PAP) family. PAPs convert phosphatidic acid to diacylglycerol, and function in de novo synthesis of glycerolipids as well as in receptor-activated signal transduction mediated by phospholipase D. This protein is a membrane glycoprotein localized at the cell plasma membrane. It has been shown to actively hydrolyze extracellular lysophosphatidic acid and short-chain phosphatidic acid. The expression of this gene is found to be enhanced by epidermal growth factor in Hela cells. ENSG00000162407 phospholipid phosphatase 3
SHANK3 ENSG00000251322 NA ENSG00000251322 SH3 and multiple ankyrin repeat domains 3
LAPTM4A 9741 This gene encodes a protein that has four predicted transmembrane domains. The function of this gene has not yet been determined; however, studies in the mouse homolog suggest a role in the transport of small molecules across endosomal and lysosomal membranes. ENSG00000068697 lysosomal protein transmembrane 4 alpha
TIMP2 7077 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. ENSG00000035862 TIMP metallopeptidase inhibitor 2
COL6A1 1291 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. ENSG00000142156 collagen type VI alpha 1
FKBP10 60681 The protein encoded by this gene belongs to the FKBP-type peptidyl-prolyl cis/trans isomerase (PPIase) family. This protein localizes to the endoplasmic reticulum and acts as a molecular chaperone. Alternatively spliced variants encoding different isoforms have been reported, but their biological validity has not been determined. ENSG00000141756 FK506 binding protein 10
SLC1A3 6507 This gene encodes a member of a member of a high affinity glutamate transporter family. This gene functions in the termination of excitatory neurotransmission in central nervous system. Mutations are associated with episodic ataxia, Type 6. Alternative splicing results in multiple transcript variants. ENSG00000079215 solute carrier family 1 member 3
YWHAE 7531 This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the mouse ortholog. It interacts with CDC25 phosphatases, RAF1 and IRS1 proteins, suggesting its role in diverse biochemical activities related to signal transduction, such as cell division and regulation of insulin sensitivity. It has also been implicated in the pathogenesis of small cell lung cancer. Two transcript variants, one protein-coding and the other non-protein-coding, have been found for this gene. ENSG00000108953 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein epsilon
NEB 4703 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. ENSG00000183091 nebulin
SLC2A3 6515 NA ENSG00000059804 solute carrier family 2 member 3
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
hemoglobin subunit beta 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB ENSG00000244734 NA
myosin, heavy chain 6, cardiac muscle, alpha 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6 ENSG00000197616 NA
NDRG family member 2 57447 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG2 ENSG00000165795 NA
glutamate-ammonia ligase 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. GLUL ENSG00000135821 NA
creatine kinase, M-type 1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. CKM ENSG00000104879 NA
natriuretic peptide A 4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NPPA ENSG00000175206 NA
apolipoprotein D 347 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. APOD ENSG00000189058 NA
actinin alpha 2 88 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ACTN2 ENSG00000077522 NA
family with sequence similarity 107 member A 11170 NA FAM107A ENSG00000168309 NA
glial fibrillary acidic protein 2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. GFAP ENSG00000131095 NA
stearoyl-CoA desaturase 6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. SCD ENSG00000099194 NA
pleckstrin homology domain containing B1 58473 NA PLEKHB1 ENSG00000021300 NA
kinesin family member 1A 547 The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. KIF1A ENSG00000130294 NA
glycerol-3-phosphate dehydrogenase 1 2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. GPD1 ENSG00000167588 NA
thrombospondin 1 7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. THBS1 ENSG00000137801 NA
lipoprotein lipase 4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. LPL ENSG00000175445 NA
desmin 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. DES ENSG00000175084 NA
glutathione peroxidase 3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. GPX3 ENSG00000211445 NA
tropomyosin 2 (beta) 7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2 ENSG00000198467 NA
hemoglobin subunit alpha 1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA1 ENSG00000206172 NA
myosin light chain 7 58498 NA MYL7 ENSG00000106631 NA
aldolase, fructose-bisphosphate A 226 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. ALDOA ENSG00000149925 NA
apolipoprotein E 348 The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. APOE ENSG00000130203 NA
QKI, KH domain containing, RNA binding 9444 The protein encoded by this gene is an RNA-binding protein that regulates pre-mRNA splicing, export of mRNAs from the nucleus, protein translation, and mRNA stability. The encoded protein is involved in myelinization and oligodendrocyte differentiation and may play a role in schizophrenia. Multiple transcript variants encoding different isoforms have been found for this gene. QKI ENSG00000112531 NA
actin, alpha 1, skeletal muscle 58 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ACTA1 ENSG00000143632 NA
myelin basic protein 4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. MBP ENSG00000197971 NA
acetyl-CoA carboxylase beta 32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. ACACB ENSG00000076555 NA
bridging integrator 1 274 This gene encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynamin, synaptojanin, endophilin, and clathrin. Isoforms that are expressed in muscle and ubiquitously expressed isoforms localize to the cytoplasm and nucleus and activate a caspase-independent apoptotic process. Studies in mouse suggest that this gene plays an important role in cardiac muscle development. Alternate splicing of the gene results in several transcript variants encoding different isoforms. Aberrant splice variants expressed in tumor cell lines have also been described. BIN1 ENSG00000136717 NA
troponin T2, cardiac type 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. TNNT2 ENSG00000118194 NA
tropomyosin 1 (alpha) 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. TPM1 ENSG00000140416 NA
fatty acid binding protein 3 2170 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. FABP3 ENSG00000121769 NA
creatine kinase B 1152 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. CKB ENSG00000166165 NA
actin, alpha, cardiac muscle 1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ACTC1 ENSG00000159251 NA
keratin 4 3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT4 ENSG00000170477 NA
myosin light chain 2 4633 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL2 ENSG00000111245 NA
actin, beta 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB ENSG00000075624 NA
ATP binding cassette subfamily A member 2 20 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. ABCA2 ENSG00000107331 NA
malate dehydrogenase 1 4190 This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. MDH1 ENSG00000014641 NA
peptidyl arginine deiminase 2 11240 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. PADI2 ENSG00000117115 NA
NPPA antisense RNA 1 ENSG00000242349 NA NPPA-AS1 ENSG00000242349 NA
glutathione peroxidase 4 2879 This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. GPX4 ENSG00000167468 NA
tropomyosin 4 7171 This gene encodes a member of the tropomyosin family of actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosins are dimers of coiled-coil proteins that polymerize end-to-end along the major groove in most actin filaments. They provide stability to the filaments and regulate access of other actin-binding proteins. In muscle cells, they regulate muscle contraction by controlling the binding of myosin heads to the actin filament. Multiple transcript variants encoding different isoforms have been found for this gene. TPM4 ENSG00000167460 NA
ankyrin repeat domain 1 27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ANKRD1 ENSG00000148677 NA
stearoyl-CoA desaturase 5 79966 Stearoyl-CoA desaturase (SCD; EC 1.14.99.5) is an integral membrane protein of the endoplasmic reticulum that catalyzes the formation of monounsaturated fatty acids from saturated fatty acids. SCD may be a key regulator of energy metabolism with a role in obesity and dislipidemia. Four SCD isoforms, Scd1 through Scd4, have been identified in mouse. In contrast, only 2 SCD isoforms, SCD1 (MIM 604031) and SCD5, have been identified in human. SCD1 shares about 85% amino acid identity with all 4 mouse SCD isoforms, as well as with rat Scd1 and Scd2. In contrast, SCD5 shares limited homology with the rodent SCDs and appears to be unique to primates (Wang et al., 2005 [PubMed 15907797]). SCD5 ENSG00000145284 NA
protamine 2 5620 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. PRM2 ENSG00000122304 NA
fatty acid synthase 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. FASN ENSG00000169710 NA
heat shock protein family B (small) member 8 26353 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. HSPB8 ENSG00000152137 NA
titin 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN ENSG00000155657 NA
ankyrin 2, neuronal 287 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. ANK2 ENSG00000145362 NA
hemoglobin subunit alpha 2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA2 ENSG00000188536 NA
NA NA NA NA ENSG00000256545 TRUE
actin gamma 1 71 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. ACTG1 ENSG00000184009 NA
metallothionein 3 4504 NA MT3 ENSG00000087250 NA
mitogen-activated protein kinase 8 interacting protein 1 9479 This gene encodes a regulator of the pancreatic beta-cell function. It is highly similar to JIP-1, a mouse protein known to be a regulator of c-Jun amino-terminal kinase (Mapk8). This protein has been shown to prevent MAPK8 mediated activation of transcription factors, and to decrease IL-1 beta and MAP kinase kinase 1 (MEKK1) induced apoptosis in pancreatic beta cells. This protein also functions as a DNA-binding transactivator of the glucose transporter GLUT2. RE1-silencing transcription factor (REST) is reported to repress the expression of this gene in insulin-secreting beta cells. This gene is found to be mutated in a type 2 diabetes family, and thus is thought to be a susceptibility gene for type 2 diabetes. MAPK8IP1 ENSG00000121653 NA
myosin binding protein C, cardiac 4607 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. MYBPC3 ENSG00000134571 NA
AE binding protein 1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AEBP1 ENSG00000106624 NA
keratin 13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13 ENSG00000171401 NA
small proline rich protein 3 6707 NA SPRR3 ENSG00000163209 NA
NDUFA4, mitochondrial complex associated 4697 The protein encoded by this gene belongs to the complex I 9kDa subunit family. Mammalian complex I of mitochondrial respiratory chain is composed of 45 different subunits. This protein has NADH dehydrogenase activity and oxidoreductase activity. It transfers electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. NDUFA4 ENSG00000189043 NA
ATPase Na+/K+ transporting subunit alpha 2 477 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ATP1A2 ENSG00000018625 NA
myosin, heavy chain 7, cardiac muscle, beta 4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. MYH7 ENSG00000092054 NA
serpin family E member 1 5054 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. SERPINE1 ENSG00000106366 NA
alcohol dehydrogenase 1B (class I), beta polypeptide 125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ADH1B ENSG00000196616 NA
kinesin family member 5C 3800 The protein encoded by this gene is a kinesin heavy chain subunit involved in the transport of cargo within the central nervous system. The encoded protein, which acts as a tetramer by associating with another heavy chain and two light chains, interacts with protein kinase CK2. Mutations in this gene have been associated with complex cortical dysplasia with other brain malformations-2. Two transcript variants, one protein-coding and the other non-protein coding, have been found for this gene. KIF5C ENSG00000168280 NA
prune homolog 2 158471 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. PRUNE2 ENSG00000106772 NA
cytochrome c oxidase subunit 7A1 1346 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. COX7A1 ENSG00000161281 NA
synaptopodin 11346 Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). SYNPO ENSG00000171992 NA
protein phosphatase 1 regulatory inhibitor subunit 1B 84152 This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. PPP1R1B ENSG00000131771 NA
spectrin beta, non-erythrocytic 1 6711 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. SPTBN1 ENSG00000115306 NA
myosin binding protein C, slow type 4604 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. MYBPC1 ENSG00000196091 NA
pyridoxal (pyridoxine, vitamin B6) kinase 8566 The protein encoded by this gene phosphorylates vitamin B6, a step required for the conversion of vitamin B6 to pyridoxal-5-phosphate, an important cofactor in intermediary metabolism. The encoded protein is cytoplasmic and probably acts as a homodimer. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. PDXK ENSG00000160209 NA
uncharacterized LOC105372824 105372824 NA LOC105372824 ENSG00000160209 NA
formin like 2 114793 This gene encodes a formin-related protein. Formin-related proteins have been implicated in morphogenesis, cytokinesis, and cell polarity. Alternatively spliced transcript variants encoding different isoforms have been described but their full-length nature has yet to be determined. FMNL2 ENSG00000157827 NA
phosphatidylethanolamine binding protein 1 5037 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. PEBP1 ENSG00000089220 NA
phosphofurin acidic cluster sorting protein 2 23241 NA PACS2 ENSG00000179364 NA
nebulette 10529 This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. NEBL ENSG00000078114 NA
NA ENSG00000266844 NA RP11-862L9.3 ENSG00000266844 NA
glycerol-3-phosphate acyltransferase, mitochondrial 57678 This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. GPAM ENSG00000119927 NA
calponin 1 1264 NA CNN1 ENSG00000130176 NA
spermatogenesis associated 20 64847 NA SPATA20 ENSG00000006282 NA
actin, alpha 2, smooth muscle, aorta 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 ENSG00000107796 NA
phospholamban 5350 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. PLN ENSG00000198523 NA
nebulin related anchoring protein 4892 NA NRAP ENSG00000197893 NA
SURP and G-patch domain containing 2 10147 This gene encodes a member of the arginine/serine-rich family of splicing factors. The encoded protein functions in mRNA processing. Alternatively spliced transcript variants have been described. SUGP2 ENSG00000064607 NA
MT-CO1 pseudogene 12 ENSG00000237973 NA MTCO1P12 ENSG00000237973 NA
glyceraldehyde-3-phosphate dehydrogenase 2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. GAPDH ENSG00000111640 NA
solute carrier family 25 member 4 291 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. SLC25A4 ENSG00000151729 NA
nuclear paraspeckle assembly transcript 1 (non-protein coding) 283131 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. NEAT1 ENSG00000245532 NA
protamine 1 5619 NA PRM1 ENSG00000175646 NA
growth arrest specific 7 8522 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. GAS7 ENSG00000007237 NA
enolase 1 2023 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. ENO1 ENSG00000074800 NA
quinoid dihydropteridine reductase 5860 This gene encodes the enzyme dihydropteridine reductase, which catalyzes the NADH-mediated reduction of quinonoid dihydrobiopterin. This enzyme is an essential component of the pterin-dependent aromatic amino acid hydroxylating systems. Mutations in this gene resulting in QDPR deficiency include aberrant splicing, amino acid substitutions, insertions, or premature terminations. Dihydropteridine reductase deficiency presents as atypical phenylketonuria due to insufficient production of biopterin, a cofactor for phenylalanine hydroxylase. QDPR ENSG00000151552 NA
sirtuin 2 22933 This gene encodes a member of the sirtuin family of proteins, homologs to the yeast Sir2 protein. Members of the sirtuin family are characterized by a sirtuin core domain and grouped into four classes. The functions of human sirtuins have not yet been determined; however, yeast sirtuin proteins are known to regulate epigenetic gene silencing and suppress recombination of rDNA. Studies suggest that the human sirtuins may function as intracellular regulatory proteins with mono-ADP-ribosyltransferase activity. The protein encoded by this gene is included in class I of the sirtuin family. Several transcript variants are resulted from alternative splicing of this gene. SIRT2 ENSG00000068903 NA
5-oxoprolinase (ATP-hydrolysing) 26873 The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). OPLAH ENSG00000178814 NA
JunB proto-oncogene, AP-1 transcription factor subunit 3726 NA JUNB ENSG00000171223 NA
carnitine O-acetyltransferase 1384 This gene encodes carnitine acetyltransferase (CRAT), which is a key enzyme in the metabolic pathway in mitochondria, peroxisomes and endoplasmic reticulum. CRAT catalyzes the reversible transfer of acyl groups from an acyl-CoA thioester to carnitine and regulates the ratio of acylCoA/CoA in the subcellular compartments. Two transcript variants encoding different isoforms have been found for this gene. CRAT ENSG00000095321 NA
amyloid beta precursor like protein 1 333 This gene encodes a member of the highly conserved amyloid precursor protein gene family. The encoded protein is a membrane-associated glycoprotein that is cleaved by secretases in a manner similar to amyloid beta A4 precursor protein cleavage. This cleavage liberates an intracellular cytoplasmic fragment that may act as a transcriptional activator. The encoded protein may also play a role in synaptic maturation during cortical development. Alternatively spliced transcript variants encoding different isoforms have been described. APLP1 ENSG00000105290 NA
kinesin family member 5A 3798 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. KIF5A ENSG00000155980 NA
ribonuclease A family member 1, pancreatic 6035 This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. RNASE1 ENSG00000129538 NA
myoglobin 4151 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. MB ENSG00000198125 NA
TSC22 domain family member 4 81628 TSC22D4 is a member of the TSC22 domain family of leucine zipper transcriptional regulators (see TSC22D3; MIM 300506) (Kester et al., 1999 [PubMed 10488076]; Fiorenza et al., 2001 [PubMed 11707329]). TSC22D4 ENSG00000166925 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name notfound
ENSG00000163220 S100A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 S100 calcium binding protein A9 NA
ENSG00000119535 CSF3R The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. 1441 colony stimulating factor 3 receptor NA
ENSG00000185201 IFITM2 NA 10581 interferon induced transmembrane protein 2 NA
ENSG00000188404 SELL This gene encodes a cell surface adhesion molecule that belongs to a family of adhesion/homing receptors. The encoded protein contains a C-type lectin-like domain, a calcium-binding epidermal growth factor-like domain, and two short complement-like repeats. The gene product is required for binding and subsequent rolling of leucocytes on endothelial cells, facilitating their migration into secondary lymphoid organs and inflammation sites. Single-nucleotide polymorphisms in this gene have been associated with various diseases including immunoglobulin A nephropathy. Alternatively spliced transcript variants have been found for this gene. 6402 selectin L NA
ENSG00000171051 FPR1 This gene encodes a G protein-coupled receptor of mammalian phagocytic cells that is a member of the G-protein coupled receptor 1 family. The protein mediates the response of phagocytic cells to invasion of the host by microorganisms and is important in host defense and inflammation. 2357 formyl peptide receptor 1 NA
ENSG00000136167 LCP1 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. Plastin 1 (otherwise known as Fimbrin) is a third distinct plastin isoform which is specifically expressed at high levels in the small intestine. The L isoform is expressed only in hemopoietic cell lineages, while the T isoform has been found in all other normal cells of solid tissues that have replicative potential (fibroblasts, endothelial cells, epithelial cells, melanocytes, etc.). However, L-plastin has been found in many types of malignant human cells of non-hemopoietic origin suggesting that its expression is induced accompanying tumorigenesis in solid tissues. 3936 lymphocyte cytosolic protein 1 NA
ENSG00000008516 MMP25 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMPs are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the protein encoded by this gene is a member of the membrane-type MMP (MT-MMP) subfamily, attached to the plasma membrane via a glycosylphosphatidyl inositol anchor. In response to bacterial infection or inflammation, the encoded protein is thought to inactivate alpha-1 proteinase inhibitor, a major tissue protectant against proteolytic enzymes released by activated neutrophils, facilitating the transendothelial migration of neutrophils to inflammatory sites. The encoded protein may also play a role in tumor invasion and metastasis through activation of MMP2. The gene has previously been referred to as MMP20 but has been renamed MMP25. 64386 matrix metallopeptidase 25 NA
ENSG00000112303 VNN2 This gene product is a member of the Vanin family of proteins that share extensive sequence similarity with each other, and also with biotinidase. The family includes secreted and membrane-associated proteins, a few of which have been reported to participate in hematopoietic cell trafficking. No biotinidase activity has been demonstrated for any of the vanin proteins, however, they possess pantetheinase activity, which may play a role in oxidative-stress response. The encoded protein is a GPI-anchored cell surface molecule that plays a role in transendothelial migration of neutrophils. This gene lies in close proximity to, and in same transcriptional orientation as two other vanin genes on chromosome 6q23-q24. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 8875 vanin 2 NA
ENSG00000115590 IL1R2 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. Interleukin 4 (IL4) is reported to antagonize the activity of interleukin 1 by inducing the expression and release of this cytokine. This gene and three other genes form a cytokine receptor gene cluster on chromosome 2q12. Alternative splicing results in multiple transcript variants and protein isoforms. Alternative splicing produces both membrane-bound and soluble proteins. A soluble protein is also produced by proteolytic cleavage. 7850 interleukin 1 receptor type 2 NA
ENSG00000162747 FCGR3B The protein encoded by this gene is a low affinity receptor for the Fc region of gamma immunoglobulins (IgG). The encoded protein acts as a monomer and can bind either monomeric or aggregated IgG. This gene may function to capture immune complexes in the peripheral circulation. Several transcript variants encoding different isoforms have been found for this gene. A highly-similar gene encoding a related protein is also found on chromosome 1. 2215 Fc fragment of IgG receptor IIIb NA
ENSG00000107738 C10orf54 NA 64115 chromosome 10 open reading frame 54 NA
ENSG00000103569 AQP9 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. 366 aquaporin 9 NA
ENSG00000163563 MNDA The myeloid cell nuclear differentiation antigen (MNDA) is detected only in nuclei of cells of the granulocyte-monocyte lineage. A 200-amino acid region of human MNDA is strikingly similar to a region in the proteins encoded by a family of interferon-inducible mouse genes, designated Ifi-201, Ifi-202, and Ifi-203, that are not regulated in a cell- or tissue-specific fashion. The 1.8-kb MNDA mRNA, which contains an interferon-stimulated response element in the 5-prime untranslated region, was significantly upregulated in human monocytes exposed to interferon alpha. MNDA is located within 2,200 kb of FCER1A, APCS, CRP, and SPTA1. In its pattern of expression and/or regulation, MNDA resembles IFI16, suggesting that these genes participate in blood cell-specific responses to interferons. 4332 myeloid cell nuclear differentiation antigen NA
ENSG00000197249 SERPINA1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 serpin family A member 1 NA
ENSG00000163191 S100A11 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. 6282 S100 calcium binding protein A11 NA
ENSG00000162551 ALPL This gene encodes a member of the alkaline phosphatase family of proteins. There are at least four distinct but related alkaline phosphatases: intestinal, placental, placental-like, and liver/bone/kidney (tissue non-specific). The first three are located together on chromosome 2, while the tissue non-specific form is located on chromosome 1. The product of this gene is a membrane bound glycosylated enzyme that is not expressed in any particular tissue and is, therefore, referred to as the tissue-nonspecific form of the enzyme. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature enzyme. This enzyme may play a role in bone mineralization. Mutations in this gene have been linked to hypophosphatasia, a disorder that is characterized by hypercalcemia and skeletal defects. 249 alkaline phosphatase, liver/bone/kidney NA
ENSG00000163464 CXCR1 The protein encoded by this gene is a member of the G-protein-coupled receptor family. This protein is a receptor for interleukin 8 (IL8). It binds to IL8 with high affinity, and transduces the signal through a G-protein activated second messenger system. Knockout studies in mice suggested that this protein inhibits embryonic oligodendrocyte precursor migration in developing spinal cord. This gene, IL8RB, a gene encoding another high affinity IL8 receptor, as well as IL8RBP, a pseudogene of IL8RB, form a gene cluster in a region mapped to chromosome 2q33-q36. 3577 C-X-C motif chemokine receptor 1 NA
ENSG00000133392 MYH11 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 myosin, heavy chain 11, smooth muscle NA
ENSG00000143546 S100A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. 6279 S100 calcium binding protein A8 NA
ENSG00000111348 ARHGDIB Members of the Rho (or ARH) protein family (see MIM 165390) and other Ras-related small GTP-binding proteins (see MIM 179520) are involved in diverse cellular events, including cell signaling, proliferation, cytoskeletal organization, and secretion. The GTP-binding proteins are active only in the GTP-bound state. At least 3 classes of proteins tightly regulate cycling between the GTP-bound and GDP-bound states: GTPase-activating proteins (GAPs), guanine nucleotide-releasing factors (GRFs), and GDP-dissociation inhibitors (GDIs). The GDIs, including ARHGDIB, decrease the rate of GDP dissociation from Ras-like GTPases (summary by Scherle et al., 1993 [PubMed 8356058]). 397 Rho GDP dissociation inhibitor beta NA
ENSG00000162511 LAPTM5 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. 7805 lysosomal protein transmembrane 5 NA
ENSG00000142347 MYO1F NA 4542 myosin IF NA
ENSG00000116701 NCF2 This gene encodes neutrophil cytosolic factor 2, the 67-kilodalton cytosolic subunit of the multi-protein NADPH oxidase complex found in neutrophils. This oxidase produces a burst of superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in this gene, as well as in other NADPH oxidase subunits, can result in chronic granulomatous disease, a disease that causes recurrent infections by catalase-positive organisms. Alternative splicing results in multiple transcript variants encoding different isoforms. 4688 neutrophil cytosolic factor 2 NA
ENSG00000143226 FCGR2A This gene encodes one member of a family of immunoglobulin Fc receptor genes found on the surface of many immune response cells. The protein encoded by this gene is a cell surface receptor found on phagocytic cells such as macrophages and neutrophils, and is involved in the process of phagocytosis and clearing of immune complexes. Alternative splicing results in multiple transcript variants. 2212 Fc fragment of IgG receptor IIa NA
ENSG00000143226 FCGR2C This gene encodes one of three members of a family of low-affinity immunoglobulin gamma Fc receptors found on the surface of many immune response cells. The encoded protein is a transmembrane glycoprotein and may be involved in phagocytosis and clearing of immune complexes. An allelic polymorphism in this gene results in both coding and non-coding variants. 9103 Fc fragment of IgG receptor IIc (gene/pseudogene) NA
ENSG00000122862 SRGN This gene encodes a protein best known as a hematopoietic cell granule proteoglycan. Proteoglycans stored in the secretory granules of many hematopoietic cells also contain a protease-resistant peptide core, which may be important for neutralizing hydrolytic enzymes. This encoded protein was found to be associated with the macromolecular complex of granzymes and perforin, which may serve as a mediator of granule-mediated apoptosis. Two transcript variants, only one of them protein-coding, have been found for this gene. 5552 serglycin NA
ENSG00000084070 SMAP2 NA 64744 small ArfGAP2 NA
ENSG00000105835 NAMPT This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. 10135 nicotinamide phosphoribosyltransferase NA
ENSG00000018280 SLC11A1 This gene is a member of the solute carrier family 11 (proton-coupled divalent metal ion transporters) family and encodes a multi-pass membrane protein. The protein functions as a divalent transition metal (iron and manganese) transporter involved in iron metabolism and host resistance to certain pathogens. Mutations in this gene have been associated with susceptibility to infectious diseases such as tuberculosis and leprosy, and inflammatory diseases such as rheumatoid arthritis and Crohn disease. Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only one has been determined. 6556 solute carrier family 11 member 1 NA
ENSG00000000938 FGR This gene is a member of the Src family of protein tyrosine kinases (PTKs). The encoded protein contains N-terminal sites for myristylation and palmitylation, a PTK domain, and SH2 and SH3 domains which are involved in mediating protein-protein interactions with phosphotyrosine-containing and proline-rich motifs, respectively. The protein localizes to plasma membrane ruffles, and functions as a negative regulator of cell migration and adhesion triggered by the beta-2 integrin signal transduction pathway. Infection with Epstein-Barr virus results in the overexpression of this gene. Multiple alternatively spliced variants, encoding the same protein, have been identified. 2268 FGR proto-oncogene, Src family tyrosine kinase NA
ENSG00000101335 MYL9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 myosin light chain 9 NA
ENSG00000110876 SELPLG This gene encodes a glycoprotein that functions as a high affinity counter-receptor for the cell adhesion molecules P-, E- and L- selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins. This protein requires two post-translational modifications, tyrosine sulfation and the addition of the sialyl Lewis x tetrasaccharide (sLex) to its O-linked glycans, for its high-affinity binding activity. Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response. Alternate splicing results in multiple transcript variants. 6404 selectin P ligand NA
ENSG00000100985 MMP9 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMP’s are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. The enzyme encoded by this gene degrades type IV and V collagens. Studies in rhesus monkeys suggest that the enzyme is involved in IL-8-induced mobilization of hematopoietic progenitor cells from bone marrow, and murine studies suggest a role in tumor-associated tissue remodeling. 4318 matrix metallopeptidase 9 NA
ENSG00000132965 ALOX5AP This gene encodes a protein which, with 5-lipoxygenase, is required for leukotriene synthesis. Leukotrienes are arachidonic acid metabolites which have been implicated in various types of inflammatory responses, including asthma, arthritis and psoriasis. This protein localizes to the plasma membrane. Inhibitors of its function impede translocation of 5-lipoxygenase from the cytoplasm to the cell membrane and inhibit 5-lipoxygenase activation. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 241 arachidonate 5-lipoxygenase activating protein NA
ENSG00000101336 HCK The protein encoded by this gene is a member of the Src family of tyrosine kinases. This protein is primarily hemopoietic, particularly in cells of the myeloid and B-lymphoid lineages. It may help couple the Fc receptor to the activation of the respiratory burst. In addition, it may play a role in neutrophil migration and in the degranulation of neutrophils. Multiple isoforms with different subcellular distributions are produced due to both alternative splicing and the use of alternative translation initiation codons, including a non-AUG (CUG) codon. 3055 HCK proto-oncogene, Src family tyrosine kinase NA
ENSG00000066336 SPI1 This gene encodes an ETS-domain transcription factor that activates gene expression during myeloid and B-lymphoid cell development. The nuclear protein binds to a purine-rich sequence known as the PU-box found near the promoters of target genes, and regulates their expression in coordination with other transcription factors and cofactors. The protein can also regulate alternative splicing of target genes. Multiple transcript variants encoding different isoforms have been found for this gene. 6688 Spi-1 proto-oncogene NA
ENSG00000100234 TIMP3 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. 7078 TIMP metallopeptidase inhibitor 3 NA
ENSG00000160255 ITGB2 This gene encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell-surface mediated signalling. The encoded protein plays an important role in immune response and defects in this gene cause leukocyte adhesion deficiency. Alternative splicing results in multiple transcript variants. 3689 integrin subunit beta 2 NA
ENSG00000163221 S100A12 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein is proposed to be involved in specific calcium-dependent signal transduction pathways and its regulatory effect on cytoskeletal components may modulate various neutrophil activities. The protein includes an antimicrobial peptide which has antibacterial activity. 6283 S100 calcium binding protein A12 NA
ENSG00000132589 FLOT2 Caveolae are small domains on the inner cell membrane involved in vesicular trafficking and signal transduction. This gene encodes a caveolae-associated, integral membrane protein, which is thought to function in neuronal signaling. 2319 flotillin 2 NA
ENSG00000169180 XPO6 The protein encoded by this gene is a member of the importin-beta family. Members of this family are regulated by the GTPase Ran to mediate transport of cargo across the nuclear envelope. This protein has been shown to mediate nuclear export of profilin-actin complexes. A pseudogene of this gene is located on the long arm of chromosome 14. Alternative splicing results in multiple transcript variants that encode different protein isoforms. 23214 exportin 6 NA
ENSG00000090382 LYZ This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. 4069 lysozyme NA
ENSG00000158869 FCER1G The high affinity IgE receptor is a key molecule involved in allergic reactions. It is a tetramer composed of 1 alpha, 1 beta, and 2 gamma chains. The gamma chains are also subunits of other Fc receptors. 2207 Fc fragment of IgE receptor Ig NA
ENSG00000142657 PGD 6-phosphogluconate dehydrogenase is the second dehydrogenase in the pentose phosphate shunt. Deficiency of this enzyme is generally asymptomatic, and the inheritance of this disorder is autosomal dominant. Hemolysis results from combined deficiency of 6-phosphogluconate dehydrogenase and 6-phosphogluconolactonase suggesting a synergism of the two enzymopathies. Several transcript variants encoding different isoforms have been found for this gene. 5226 phosphogluconate dehydrogenase NA
ENSG00000204525 HLA-C HLA-C belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domain, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Over one hundred HLA-C alleles have been described 3107 major histocompatibility complex, class I, C NA
ENSG00000187116 LILRA5 The protein encoded by this gene is a member of the leukocyte immunoglobulin-like receptor (LIR) family. LIR family members are known to have activating and inibitory functions in leukocytes. Crosslink of this receptor protein on the surface of monocytes has been shown to induce calcium flux and secretion of several proinflammatory cytokines, which suggests the roles of this protein in triggering innate immune responses. This gene is one of the leukocyte receptor genes that form a gene cluster on the chromosomal region 19q13.4. Four alternatively spliced transcript variants encoding distinct isoforms have been described. 353514 leukocyte immunoglobulin like receptor A5 NA
ENSG00000140678 ITGAX This gene encodes the integrin alpha X chain protein. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as inactivated-C3b (iC3b) receptor 4 (CR4). The alpha X beta 2 complex seems to overlap the properties of the alpha M beta 2 integrin in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Two transcript variants encoding different isoforms have been found for this gene. 3687 integrin subunit alpha X NA
ENSG00000122786 CALD1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. 800 caldesmon 1 NA
ENSG00000160883 HK3 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. 3101 hexokinase 3 NA
ENSG00000180353 HCLS1 NA 3059 hematopoietic cell-specific Lyn substrate 1 NA
ENSG00000102879 CORO1A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Alternative splicing results in multiple transcript variants. A related pseudogene has been defined on chromosome 16. 11151 coronin 1A NA
ENSG00000244734 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 hemoglobin subunit beta NA
ENSG00000143119 CD53 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. 963 CD53 molecule NA
ENSG00000077984 CST7 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions. This gene encodes a glycosylated cysteine protease inhibitor with a putative role in immune regulation through inhibition of a unique target in the hematopoietic system. Expression of the protein has been observed in various human cancer cell lines established from malignant tumors. 8530 cystatin F NA
ENSG00000211445 GPX3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. 2878 glutathione peroxidase 3 NA
ENSG00000137642 SORL1 This gene encodes a mosaic protein that belongs to at least two families: the vacuolar protein sorting 10 (VPS10) domain-containing receptor family, and the low density lipoprotein receptor (LDLR) family. The encoded protein also contains fibronectin type III repeats and an epidermal growth factor repeat. The encoded preproprotein is proteolytically processed to generate the mature receptor, which likely plays roles in endocytosis and sorting. Mutations in this gene may be associated with Alzheimer’s disease. 6653 sortilin-related receptor, L(DLR class) A repeats containing NA
ENSG00000259716 NA NA NA NA TRUE
ENSG00000011465 DCN This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 decorin NA
ENSG00000141480 ARRB2 Members of arrestin/beta-arrestin protein family are thought to participate in agonist-mediated desensitization of G-protein-coupled receptors and cause specific dampening of cellular responses to stimuli such as hormones, neurotransmitters, or sensory signals. Arrestin beta 2, like arrestin beta 1, was shown to inhibit beta-adrenergic receptor function in vitro. It is expressed at high levels in the central nervous system and may play a role in the regulation of synaptic receptors. Besides the brain, a cDNA for arrestin beta 2 was isolated from thyroid gland, and thus it may also be involved in hormone-specific desensitization of TSH receptors. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 409 arrestin beta 2 NA
ENSG00000115271 GCA This gene product, grancalcin, is a calcium-binding protein abundant in neutrophils and macrophages. It belongs to the penta-EF-hand subfamily of proteins which includes sorcin, calpain, and ALG-2. Grancalcin localization is dependent upon calcium and magnesium. In the absence of divalent cation, grancalcin localizes to the cytosolic fraction; with magnesium alone, it partitions with the granule fraction; and in the presence of magnesium and calcium, it associates with both the granule and membrane fractions, suggesting a role for grancalcin in granule-membrane fusion and degranulation. 25801 grancalcin NA
ENSG00000146094 DOK3 NA 79930 docking protein 3 NA
ENSG00000100365 NCF4 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. 4689 neutrophil cytosolic factor 4 NA
ENSG00000079308 TNS1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. 7145 tensin 1 NA
ENSG00000076662 ICAM3 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is constitutively and abundantly expressed by all leucocytes and may be the most important ligand for LFA-1 in the initiation of the immune response. It functions not only as an adhesion molecule, but also as a potent signalling molecule. Alternative splicing results in multiple transcript variants encoding different isoforms. 3385 intercellular adhesion molecule 3 NA
ENSG00000100504 PYGL This gene encodes a homodimeric protein that catalyses the cleavage of alpha-1,4-glucosidic bonds to release glucose-1-phosphate from liver glycogen stores. This protein switches from inactive phosphorylase B to active phosphorylase A by phosphorylation of serine residue 15. Activity of this enzyme is further regulated by multiple allosteric effectors and hormonal controls. Humans have three glycogen phosphorylase genes that encode distinct isozymes that are primarily expressed in liver, brain and muscle, respectively. The liver isozyme serves the glycemic demands of the body in general while the brain and muscle isozymes supply just those tissues. In glycogen storage disease type VI, also known as Hers disease, mutations in liver glycogen phosphorylase inhibit the conversion of glycogen to glucose and results in moderate hypoglycemia, mild ketosis, growth retardation and hepatomegaly. Alternative splicing results in multiple transcript variants encoding different isoforms. 5836 phosphorylase, glycogen, liver NA
ENSG00000234745 HLA-B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. 3106 major histocompatibility complex, class I, B NA
ENSG00000128340 RAC2 This gene encodes a member of the Ras superfamily of small guanosine triphosphate (GTP)-metabolizing proteins. The encoded protein localizes to the plasma membrane, where it regulates diverse processes, such as secretion, phagocytosis, and cell polarization. Activity of this protein is also involved in the generation of reactive oxygen species. Mutations in this gene are associated with neutrophil immunodeficiency syndrome. There is a pseudogene for this gene on chromosome 6. 5880 ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2) NA
ENSG00000011600 TYROBP This gene encodes a transmembrane signaling polypeptide which contains an immunoreceptor tyrosine-based activation motif (ITAM) in its cytoplasmic domain. The encoded protein may associate with the killer-cell inhibitory receptor (KIR) family of membrane glycoproteins and may act as an activating signal transduction element. This protein may bind zeta-chain (TCR) associated protein kinase 70kDa (ZAP-70) and spleen tyrosine kinase (SYK) and play a role in signal transduction, bone modeling, brain myelination, and inflammation. Mutations within this gene have been associated with polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy (PLOSL), also known as Nasu-Hakola disease. Its putative receptor, triggering receptor expressed on myeloid cells 2 (TREM2), also causes PLOSL. Multiple alternative transcript variants encoding distinct isoforms have been identified for this gene. 7305 TYRO protein tyrosine kinase binding protein NA
ENSG00000114626 ABTB1 This gene encodes a protein with an ankyrin repeat region and two BTB/POZ domains, which are thought to be involved in protein-protein interactions. Expression of this gene is activated by the phosphatase and tensin homolog, a tumor suppressor. Alternate splicing results in three transcript variants. 80325 ankyrin repeat and BTB domain containing 1 NA
ENSG00000204592 HLA-E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. 3133 major histocompatibility complex, class I, E NA
ENSG00000151726 ACSL1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. 2180 acyl-CoA synthetase long-chain family member 1 NA
ENSG00000198736 MSRB1 This gene encodes a selenoprotein, which contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenoprotein genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. This protein belongs to the methionine sulfoxide reductase (Msr) protein family which includes repair enzymes that reduce oxidized methionine residues in proteins. The protein encoded by this gene is expressed in a variety of adult and fetal tissues and localizes to the cell nucleus and cytosol. 51734 methionine sulfoxide reductase B1 NA
ENSG00000148180 GSN The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. 2934 gelsolin NA
ENSG00000213654 GPSM3 NA 63940 G-protein signaling modulator 3 NA
ENSG00000111913 FAM65B The protein encoded by this gene stimulates the formation of a non-mitotic multinucleate syncytium from proliferative cytotrophoblasts during trophoblast differentiation. Alternative splicing of this gene results in multiple transcript variants. 9750 family with sequence similarity 65 member B NA
ENSG00000021355 SERPINB1 The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. 1992 serpin family B member 1 NA
ENSG00000142173 COL6A2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. 1292 collagen type VI alpha 2 NA
ENSG00000204936 CD177 This gene encodes a glycosyl-phosphatidylinositol (GPI)-linked cell surface glycoprotein that plays a role in neutrophil activation. The protein can bind platelet endothelial cell adhesion molecule-1 and function in neutrophil transmigration. Mutations in this gene are associated with myeloproliferative diseases. Over-expression of this gene has been found in patients with polycythemia rubra vera. Autoantibodies against the protein may result in pulmonary transfusion reactions, and it may be involved in Wegener’s granulomatosis. A related pseudogene, which is adjacent to this gene on chromosome 19, has been identified. 57126 CD177 molecule NA
ENSG00000177469 PTRF This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. 284119 polymerase I and transcript release factor NA
ENSG00000103187 COTL1 This gene encodes one of the numerous actin-binding proteins which regulate the actin cytoskeleton. This protein binds F-actin, and also interacts with 5-lipoxygenase, which is the first committed enzyme in leukotriene biosynthesis. Although this gene has been reported to map to chromosome 17 in the Smith-Magenis syndrome region, the best alignments for this gene are to chromosome 16. The Smith-Magenis syndrome region is the site of two related pseudogenes. 23406 coactosin like F-actin binding protein 1 NA
ENSG00000173535 TNFRSF10C The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor contains an extracellular TRAIL-binding domain and a transmembrane domain, but no cytoplasmic death domain. This receptor is not capable of inducing apoptosis, and is thought to function as an antagonistic receptor that protects cells from TRAIL-induced apoptosis. This gene was found to be a p53-regulated DNA damage-inducible gene. The expression of this gene was detected in many normal tissues but not in most cancer cell lines, which may explain the specific sensitivity of cancer cells to the apoptosis-inducing activity of TRAIL. 8794 tumor necrosis factor receptor superfamily member 10c NA
ENSG00000189067 LITAF Lipopolysaccharide is a potent stimulator of monocytes and macrophages, causing secretion of tumor necrosis factor-alpha (TNF-alpha) and other inflammatory mediators. This gene encodes lipopolysaccharide-induced TNF-alpha factor, which is a DNA-binding protein and can mediate the TNF-alpha expression by direct binding to the promoter region of the TNF-alpha gene. The transcription of this gene is induced by tumor suppressor p53 and has been implicated in the p53-induced apoptotic pathway. Mutations in this gene cause Charcot-Marie-Tooth disease type 1C (CMT1C) and may be involved in the carcinogenesis of extramammary Paget’s disease (EMPD). Multiple alternatively spliced transcript variants have been found for this gene. 9516 lipopolysaccharide induced TNF factor NA
ENSG00000137462 TLR2 The protein encoded by this gene is a member of the Toll-like receptor (TLR) family which plays a fundamental role in pathogen recognition and activation of innate immunity. TLRs are highly conserved from Drosophila to humans and share structural and functional similarities. This protein is a cell-surface protein that can form heterodimers with other TLR family members to recognize conserved molecules derived from microorganisms known as pathogen-associated molecular patterns (PAMPs). Activation of TLRs by PAMPs leads to an up-regulation of signaling pathways to modulate the host’s inflammatory response. This gene is also thought to promote apoptosis in response to bacterial lipoproteins. This gene has been implicated in the pathogenesis of several autoimmune diseases. Alternative splicing results in multiple transcript variants. 7097 toll like receptor 2 NA
ENSG00000177105 RHOG This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. The encoded protein facilitates translocation of a functional guanine nucleotide exchange factor (GEF) complex from the cytoplasm to the plasma membrane where ras-related C3 botulinum toxin substrate 1 is activated to promote lamellipodium formation and cell migration. Two related pseudogene have been identified on chromosomes 20 and X. 391 ras homolog family member G NA
ENSG00000115607 IL18RAP The protein encoded by this gene is an accessory subunit of the heterodimeric receptor for interleukin 18 (IL18), a proinflammatory cytokine involved in inducing cell-mediated immunity. This protein enhances the IL18-binding activity of the IL18 receptor and plays a role in signaling by IL18. Mutations in this gene are associated with Crohn’s disease and inflammatory bowel disease, and susceptibility to celiac disease and leprosy. Alternatively spliced transcript variants of this gene have been described, but their full-length nature is not known. 8807 interleukin 18 receptor accessory protein NA
ENSG00000142798 HSPG2 This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. 3339 heparan sulfate proteoglycan 2 NA
ENSG00000124942 AHNAK NA 79026 AHNAK nucleoprotein NA
ENSG00000176788 BASP1 This gene encodes a membrane bound protein with several transient phosphorylation sites and PEST motifs. Conservation of proteins with PEST sequences among different species supports their functional significance. PEST sequences typically occur in proteins with high turnover rates. Immunological characteristics of this protein are species specific. This protein also undergoes N-terminal myristoylation. Alternative splicing results in multiple transcript variants that encode the same protein. 10409 brain abundant membrane attached signal protein 1 NA
ENSG00000121316 PLBD1 NA 79887 phospholipase B domain containing 1 NA
ENSG00000160796 NBEAL2 The protein encoded by this gene contains a beige and Chediak-Higashi (BEACH) domain and multiple WD40 domains, and may play a role in megakaryocyte alpha-granule biogenesis. Mutations in this gene are a cause of gray platelet syndrome. 23218 neurobeachin like 2 NA
ENSG00000101265 RASSF2 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. 9770 Ras association domain family member 2 NA
ENSG00000261971 MMP25-AS1 NA ENSG00000261971 MMP25 antisense RNA 1 NA
ENSG00000171236 LRG1 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). 116844 leucine rich alpha-2-glycoprotein 1 NA
ENSG00000155926 SLA NA 6503 Src-like-adaptor NA
ENSG00000160410 SHKBP1 NA 92799 SH3KBP1 binding protein 1 NA
ENSG00000149131 SERPING1 This gene encodes a highly glycosylated plasma protein involved in the regulation of the complement cascade. Its protein inhibits activated C1r and C1s of the first complement component and thus regulates complement activation. Deficiency of this protein is associated with hereditary angioneurotic oedema (HANE). Alternative splicing results in multiple transcript variants encoding the same isoform. 710 serpin family G member 1 NA
ENSG00000065534 MYLK This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. 4638 myosin light chain kinase NA
ENSG00000136156 ITM2B Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. 9445 integral membrane protein 2B NA
ENSG00000143878 RHOB NA 388 ras homolog family member B NA
ENSG00000131236 CAP1 The protein encoded by this gene is related to the S. cerevisiae CAP protein, which is involved in the cyclic AMP pathway. The human protein is able to interact with other molecules of the same protein, as well as with CAP2 and actin. Alternatively spliced transcript variants have been identified. 10487 CAP, adenylate cyclase-associated protein 1 (yeast) NA
ENSG00000177156 TALDO1 Transaldolase 1 is a key enzyme of the nonoxidative pentose phosphate pathway providing ribose-5-phosphate for nucleic acid synthesis and NADPH for lipid biosynthesis. This pathway can also maintain glutathione at a reduced state and thus protect sulfhydryl groups and cellular integrity from oxygen radicals. The functional gene of transaldolase 1 is located on chromosome 11 and a pseudogene is identified on chromosome 1 but there are conflicting map locations. The second and third exon of this gene were developed by insertion of a retrotransposable element. This gene is thought to be involved in multiple sclerosis. 6888 transaldolase 1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol summary X_id
desmin ENSG00000175084 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674
CD74 molecule ENSG00000019582 CD74 The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. This protein also interacts with amyloid precursor protein (APP) and suppresses the production of amyloid beta (Abeta). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. 972
myelin basic protein ENSG00000197971 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155
beta-2-microglobulin ENSG00000166710 B2M This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. 567
hemoglobin subunit beta ENSG00000244734 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043
major histocompatibility complex, class II, DR alpha ENSG00000204287 HLA-DRA HLA-DRA is one of the HLA class II alpha chain paralogues. This class II molecule is a heterodimer consisting of an alpha and a beta chain, both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The alpha chain is approximately 33-35 kDa and its gene contains 5 exons. Exon 1 encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, and exon 4 encodes the transmembrane domain and the cytoplasmic tail. DRA does not have polymorphisms in the peptide binding part and acts as the sole alpha chain for DRB1, DRB3, DRB4 and DRB5. 3122
tropomyosin 2 (beta) ENSG00000198467 TPM2 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169
major histocompatibility complex, class I, E ENSG00000204592 HLA-E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. 3133
filamin C ENSG00000128591 FLNC This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318
actin gamma 1 ENSG00000184009 ACTG1 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. 71
hemoglobin subunit alpha 2 ENSG00000188536 HBA2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040
protease, serine 1 ENSG00000204983 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644
carboxypeptidase A1 ENSG00000091704 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357
major histocompatibility complex, class I, C ENSG00000204525 HLA-C HLA-C belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domain, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Over one hundred HLA-C alleles have been described 3107
major histocompatibility complex, class I, B ENSG00000234745 HLA-B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. 3106
synaptopodin 2 ENSG00000172403 SYNPO2 NA 171024
major histocompatibility complex, class II, DP alpha 1 ENSG00000231389 HLA-DPA1 HLA-DPA1 belongs to the HLA class II alpha chain paralogues. This class II molecule is a heterodimer consisting of an alpha (DPA) and a beta (DPB) chain, both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The alpha chain is approximately 33-35 kDa and its gene contains 5 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, exon 4 encodes the transmembrane domain and the cytoplasmic tail. Within the DP molecule both the alpha chain and the beta chain contain the polymorphisms specifying the peptide binding specificities, resulting in up to 4 different molecules. 3113
glycoprotein 2 ENSG00000169347 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813
ribosomal protein L3 ENSG00000100316 RPL3 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6122
ribosomal protein S6 ENSG00000137154 RPS6 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6194
titin ENSG00000155657 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273
pancreatic lipase ENSG00000175535 PNLIP This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406
tropomyosin 1 (alpha) ENSG00000140416 TPM1 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. 7168
creatine kinase B ENSG00000166165 CKB The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. 1152
chymotrypsin like elastase family member 3A ENSG00000142789 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136
carboxyl ester lipase ENSG00000170835 CEL The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. 1056
serglycin ENSG00000122862 SRGN This gene encodes a protein best known as a hematopoietic cell granule proteoglycan. Proteoglycans stored in the secretory granules of many hematopoietic cells also contain a protease-resistant peptide core, which may be important for neutralizing hydrolytic enzymes. This encoded protein was found to be associated with the macromolecular complex of granzymes and perforin, which may serve as a mediator of granule-mediated apoptosis. Two transcript variants, only one of them protein-coding, have been found for this gene. 5552
carboxypeptidase B1 ENSG00000153002 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360
major histocompatibility complex, class II, DR beta 1 ENSG00000196126 HLA-DRB1 HLA-DRB1 belongs to the HLA class II beta chain paralogs. The class II molecule is a heterodimer consisting of an alpha (DRA) and a beta chain (DRB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The beta chain is approximately 26-28 kDa. It is encoded by 6 exons. Exon one encodes the leader peptide; exons 2 and 3 encode the two extracellular domains; exon 4 encodes the transmembrane domain; and exon 5 encodes the cytoplasmic tail. Within the DR molecule the beta chain contains all the polymorphisms specifying the peptide binding specificities. Hundreds of DRB1 alleles have been described and typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. DRB1 is expressed at a level five times higher than its paralogs DRB3, DRB4 and DRB5. DRB1 is present in all individuals. Allelic variants of DRB1 are linked with either none or one of the genes DRB3, DRB4 and DRB5. There are 4 related pseudogenes: DRB2, DRB6, DRB7, DRB8 and DRB9. 3123
HLA class II histocompatibility antigen, DRB1-7 beta chain ENSG00000196126 LOC105369230 NA 105369230
enolase 1 ENSG00000074800 ENO1 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. 2023
lysosomal protein transmembrane 5 ENSG00000162511 LAPTM5 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. 7805
NA ENSG00000266844 RP11-862L9.3 NA ENSG00000266844
eukaryotic translation elongation factor 1 alpha 1 ENSG00000156508 EEF1A1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. 1915
ribosomal protein S18 ENSG00000231500 RPS18 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S13P family of ribosomal proteins. It is located in the cytoplasm. The gene product of the E. coli ortholog (ribosomal protein S13) is involved in the binding of fMet-tRNA, and thus, in the initiation of translation. This gene is an ortholog of mouse Ke3. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6222
clusterin ENSG00000120885 CLU The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. 1191
major histocompatibility complex, class I, A ENSG00000206503 HLA-A HLA-A belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-A alleles have been described. 3105
hematopoietic cell-specific Lyn substrate 1 ENSG00000180353 HCLS1 NA 3059
myosin light chain 9 ENSG00000101335 MYL9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398
solute carrier family 2 member 3 ENSG00000059804 SLC2A3 NA 6515
surfactant protein A2 ENSG00000185303 SFTPA2 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. 729238
CD53 molecule ENSG00000143119 CD53 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. 963
regenerating family member 1 alpha ENSG00000115386 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967
supervillin ENSG00000197321 SVIL This gene encodes a bipartite protein with distinct amino- and carboxy-terminal domains. The amino-terminus contains nuclear localization signals and the carboxy-terminus contains numerous consecutive sequences with extensive similarity to proteins in the gelsolin family of actin-binding proteins, which cap, nucleate, and/or sever actin filaments. The gene product is tightly associated with both actin filaments and plasma membranes, suggesting a role as a high-affinity link between the actin cytoskeleton and the membrane. The encoded protein appears to aid in both myosin II assembly during cell spreading and disassembly of focal adhesions. Several transcript variants encoding different isoforms of supervillin have been described. 6840
NDRG family member 2 ENSG00000165795 NDRG2 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. 57447
T-cell immune regulator 1, ATPase H+ transporting V0 subunit a3 ENSG00000110719 TCIRG1 Through alternate splicing, this gene encodes two proteins with similarity to subunits of the vacuolar ATPase (V-ATPase) but the encoded proteins seem to have different functions. V-ATPase is a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, and receptor-mediated endocytosis. V-ATPase is comprised of a cytosolic V1 domain and a transmembrane V0 domain. Mutations in this gene are associated with infantile malignant osteopetrosis. 10312
actin, beta ENSG00000075624 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60
secreted protein acidic and cysteine rich ENSG00000113140 SPARC This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. 6678
baculoviral IAP repeat containing 3 ENSG00000023445 BIRC3 This gene encodes a member of the IAP family of proteins that inhibit apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2, probably by interfering with activation of ICE-like proteases. The encoded protein inhibits apoptosis induced by serum deprivation but does not affect apoptosis resulting from exposure to menadione, a potent inducer of free radicals. It contains 3 baculovirus IAP repeats and a ring finger domain. Transcript variants encoding the same isoform have been identified. 330
ribosomal protein S8 ENSG00000142937 RPS8 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6202
immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896 IGHG1 NA ENSG00000211896
myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054 MYH7 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625
glial fibrillary acidic protein ENSG00000131095 GFAP This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 2670
transgelin 2 ENSG00000158710 TAGLN2 The protein encoded by this gene is similar to the protein transgelin, which is one of the earliest markers of differentiated smooth muscle. The specific function of this protein has not yet been determined, although it is thought to be a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene. 8407
surfactant protein A1 ENSG00000122852 SFTPA1 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. 653509
colipase ENSG00000137392 CLPS The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208
TNF alpha induced protein 3 ENSG00000118503 TNFAIP3 This gene was identified as a gene whose expression is rapidly induced by the tumor necrosis factor (TNF). The protein encoded by this gene is a zinc finger protein and ubiqitin-editing enzyme, and has been shown to inhibit NF-kappa B activation as well as TNF-mediated apoptosis. The encoded protein, which has both ubiquitin ligase and deubiquitinase activities, is involved in the cytokine-mediated immune and inflammatory responses. Several transcript variants encoding the same protein have been found for this gene. 7128
coactosin like F-actin binding protein 1 ENSG00000103187 COTL1 This gene encodes one of the numerous actin-binding proteins which regulate the actin cytoskeleton. This protein binds F-actin, and also interacts with 5-lipoxygenase, which is the first committed enzyme in leukotriene biosynthesis. Although this gene has been reported to map to chromosome 17 in the Smith-Magenis syndrome region, the best alignments for this gene are to chromosome 16. The Smith-Magenis syndrome region is the site of two related pseudogenes. 23406
ribosomal protein S27a ENSG00000143947 RPS27A Ubiquitin, a highly conserved protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome, is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin fused to an unrelated protein. This gene encodes a fusion protein consisting of ubiquitin at the N terminus and ribosomal protein S27a at the C terminus. When expressed in yeast, the protein is post-translationally processed, generating free ubiquitin monomer and ribosomal protein S27a. Ribosomal protein S27a is a component of the 40S subunit of the ribosome and belongs to the S27AE family of ribosomal proteins. It contains C4-type zinc finger domains and is located in the cytoplasm. Pseudogenes derived from this gene are present in the genome. As with ribosomal protein S27a, ribosomal protein L40 is also synthesized as a fusion protein with ubiquitin; similarly, ribosomal protein S30 is synthesized as a fusion protein with the ubiquitin-like protein fubi. Multiple alternatively spliced transcript variants that encode the same proteins have been identified. 6233
F-box protein 32 ENSG00000156804 FBXO32 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and contains an F-box domain. This protein is highly expressed during muscle atrophy, whereas mice deficient in this gene were found to be resistant to atrophy. This protein is thus a potential drug target for the treatment of muscle atrophy. Alternative splicing results in multiple transcript variants encoding different isoforms. 114907
major histocompatibility complex, class II, DP beta 1 ENSG00000223865 HLA-DPB1 HLA-DPB belongs to the HLA class II beta chain paralogues. This class II molecule is a heterodimer consisting of an alpha (DPA) and a beta chain (DPB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The beta chain is approximately 26-28 kDa and its gene contains 6 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, exon 4 encodes the transmembrane domain and exon 5 encodes the cytoplasmic tail. Within the DP molecule both the alpha chain and the beta chain contain the polymorphisms specifying the peptide binding specificities, resulting in up to 4 different molecules. 3115
coronin 1A ENSG00000102879 CORO1A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Alternative splicing results in multiple transcript variants. A related pseudogene has been defined on chromosome 16. 11151
hemoglobin subunit alpha 1 ENSG00000206172 HBA1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039
heat shock protein family B (small) member 7 ENSG00000173641 HSPB7 NA 27129
phosphodiesterase 4D interacting protein ENSG00000178104 PDE4DIP The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. 9659
cathepsin S ENSG00000163131 CTSS The protein encoded by this gene, a member of the peptidase C1 family, is a lysosomal cysteine proteinase that may participate in the degradation of antigenic proteins to peptides for presentation on MHC class II molecules. The encoded protein can function as an elastase over a broad pH range in alveolar macrophages. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. 1520
ribosomal protein lateral stalk subunit P0 ENSG00000089157 RPLP0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6175
maturin, neural progenitor differentiation regulator homolog (Xenopus) ENSG00000180354 MTURN NA 222166
chymotrypsinogen B2 ENSG00000168928 CTRB2 NA 440387
regulator of G-protein signaling 1 ENSG00000090104 RGS1 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. 5996
ribosomal protein L11 ENSG00000142676 RPL11 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L5P family of ribosomal proteins. It is located in the cytoplasm. The protein probably associates with the 5S rRNA. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6135
ribosomal protein SA ENSG00000168028 RPSA Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Many of the effects of laminin are mediated through interactions with cell surface receptors. These receptors include members of the integrin family, as well as non-integrin laminin-binding proteins. This gene encodes a high-affinity, non-integrin family, laminin receptor 1. This receptor has been variously called 67 kD laminin receptor, 37 kD laminin receptor precursor (37LRP) and p40 ribosome-associated protein. The amino acid sequence of laminin receptor 1 is highly conserved through evolution, suggesting a key biological function. It has been observed that the level of the laminin receptor transcript is higher in colon carcinoma tissue and lung cancer cell line than their normal counterparts. Also, there is a correlation between the upregulation of this polypeptide in cancer cells and their invasive and metastatic phenotype. Multiple copies of this gene exist, however, most of them are pseudogenes thought to have arisen from retropositional events. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. 3921
pleckstrin homology domain containing B1 ENSG00000021300 PLEKHB1 NA 58473
immunoglobulin heavy constant mu ENSG00000211899 IGHM NA ENSG00000211899
chymotrypsin like elastase family member 3B ENSG00000219073 CELA3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. 23436
signal transducer and activator of transcription 1 ENSG00000115415 STAT1 The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein can be activated by various ligands including interferon-alpha, interferon-gamma, EGF, PDGF and IL6. This protein mediates the expression of a variety of genes, which is thought to be important for cell viability in response to different cell stimuli and pathogens. Two alternatively spliced transcript variants encoding distinct isoforms have been described. 6772
carboxypeptidase E ENSG00000109472 CPE This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. 1363
tumor protein p53 inducible nuclear protein 2 ENSG00000078804 TP53INP2 NA 58476
sorbin and SH3 domain containing 1 ENSG00000095637 SORBS1 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. 10580
chymotrypsinogen B1 ENSG00000168925 CTRB1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. 1504
integrin subunit beta 2 ENSG00000160255 ITGB2 This gene encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell-surface mediated signalling. The encoded protein plays an important role in immune response and defects in this gene cause leukocyte adhesion deficiency. Alternative splicing results in multiple transcript variants. 3689
intercellular adhesion molecule 1 ENSG00000090339 ICAM1 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. 3383
MX dynamin like GTPase 1 ENSG00000157601 MX1 This gene encodes a guanosine triphosphate (GTP)-metabolizing protein that participates in the cellular antiviral response. The encoded protein is induced by type I and type II interferons and antagonizes the replication process of several different RNA and DNA viruses. There is a related gene located adjacent to this gene on chromosome 21, and there are multiple pseudogenes located in a cluster on chromosome 4. Alternative splicing results in multiple transcript variants. 4599
ribosomal protein S19 ENSG00000105372 RPS19 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19E family of ribosomal proteins. It is located in the cytoplasm. Mutations in this gene cause Diamond-Blackfan anemia (DBA), a constitutional erythroblastopenia characterized by absent or decreased erythroid precursors, in a subset of patients. This suggests a possible extra-ribosomal function for this gene in erythropoietic differentiation and proliferation, in addition to its ribosomal function. Higher expression levels of this gene in some primary colon carcinomas compared to matched normal colon tissues has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6223
creatine kinase, M-type ENSG00000104879 CKM The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158
crystallin alpha B ENSG00000109846 CRYAB Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410
eukaryotic translation elongation factor 1 alpha 1 pseudogene 5 ENSG00000196205 EEF1A1P5 NA ENSG00000196205
surfactant protein C ENSG00000168484 SFTPC This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. 6440
NLR family CARD domain containing 5 ENSG00000140853 NLRC5 This gene encodes a member of the caspase recruitment domain-containing NLR family. This gene plays a role in cytokine response and antiviral immunity through its inhibition of NF-kappa-B activation and negative regulation of type I interferon signaling pathways. 84166
ribosomal protein S3 ENSG00000149273 RPS3 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 6188
protein tyrosine phosphatase, non-receptor type 6 ENSG00000111679 PTPN6 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. N-terminal part of this PTP contains two tandem Src homolog (SH2) domains, which act as protein phospho-tyrosine binding domains, and mediate the interaction of this PTP with its substrates. This PTP is expressed primarily in hematopoietic cells, and functions as an important regulator of multiple signaling pathways in hematopoietic cells. This PTP has been shown to interact with, and dephosphorylate a wide spectrum of phospho-proteins involved in hematopoietic cell signaling. Multiple alternatively spliced variants of this gene, which encode distinct isoforms, have been reported. 5777
cathepsin H ENSG00000103811 CTSH The protein encoded by this gene is a lysosomal cysteine proteinase important in the overall degradation of lysosomal proteins. It is composed of a dimer of disulfide-linked heavy and light chains, both produced from a single protein precursor. The encoded protein, which belongs to the peptidase C1 protein family, can act both as an aminopeptidase and as an endopeptidase. Increased expression of this gene has been correlated with malignant progression of prostate tumors. Alternate splicing of this gene results in multiple transcript variants encoding different isoforms. 1512
kinesin family member 1A ENSG00000130294 KIF1A The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. 547
calmodulin 1 (phosphorylase kinase, delta) ENSG00000198668 CALM1 This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. 801
calmodulin 2 (phosphorylase kinase, delta) ENSG00000198668 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805
phosphoprotein enriched in astrocytes 15 ENSG00000162734 PEA15 This gene encodes a death effector domain-containing protein that functions as a negative regulator of apoptosis. The encoded protein is an endogenous substrate for protein kinase C. This protein is also overexpressed in type 2 diabetes mellitus, where it may contribute to insulin resistance in glucose uptake. Alternative splicing results in multiple transcript variants. 8682
bone marrow stromal cell antigen 2 ENSG00000130303 BST2 Bone marrow stromal cells are involved in the growth and development of B-cells. The specific function of the protein encoded by the bone marrow stromal cell antigen 2 is undetermined; however, this protein may play a role in pre-B-cell growth and in rheumatoid arthritis. 684
myoglobin ENSG00000198125 MB This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151
gap junction protein alpha 1 ENSG00000152661 GJA1 This gene is a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. The encoded protein is the major protein of gap junctions in the heart that are thought to have a crucial role in the synchronized contraction of the heart and in embryonic development. A related intronless pseudogene has been mapped to chromosome 5. Mutations in this gene have been associated with oculodentodigital dysplasia, autosomal recessive craniometaphyseal dysplasia and heart malformations. 2697
microtubule associated protein 1A ENSG00000166963 MAP1A This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. 4130
prothymosin, alpha ENSG00000187514 PTMA NA 5757
prothymosin alpha-like ENSG00000187514 LOC728026 NA 728026
laminin subunit beta 2 ENSG00000172037 LAMB2 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins, composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively), form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 2. The beta 2 chain contains the 7 structural domains typical of beta chains of laminin, including the short alpha region. However, unlike beta 1 chain, beta 2 has a more restricted tissue distribution. It is enriched in the basement membrane of muscles at the neuromuscular junctions, kidney glomerulus and vascular smooth muscle. Transgenic mice in which the beta 2 chain gene was inactivated by homologous recombination, showed defects in the maturation of neuromuscular junctions and impairment of glomerular filtration. Alternative splicing involving a non consensus 5’ splice site (gc) in the 5’ UTR of this gene has been reported. It was suggested that inefficient splicing of this first intron, which does not change the protein sequence, results in a greater abundance of the unspliced form of the transcript than the spliced form. The full-length nature of the spliced transcript is not known. 3913
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin TG ENSG00000042832 NA
7173 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. thyroid peroxidase TPO ENSG00000115705 NA
1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin DES ENSG00000175084 NA
7849 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. paired box 8 PAX8 ENSG00000125618 NA
1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. clusterin CLU ENSG00000120885 NA
NA NA NA NA ENSG00000090920 TRUE
5909 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. RAP1 GTPase activating protein RAP1GAP ENSG00000076864 NA
4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle MYH11 ENSG00000133392 NA
2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. fibronectin 1 FN1 ENSG00000115414 NA
79026 NA AHNAK nucleoprotein AHNAK ENSG00000124942 NA
283131 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. nuclear paraspeckle assembly transcript 1 (non-protein coding) NEAT1 ENSG00000245532 NA
7178 NA tumor protein, translationally-controlled 1 TPT1 ENSG00000133112 NA
2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase GAPDH ENSG00000111640 NA
7184 This gene encodes a member of a family of adenosine triphosphate(ATP)-metabolizing molecular chaperones with roles in stabilizing and folding other proteins. The encoded protein is localized to melanosomes and the endoplasmic reticulum. Expression of this protein is associated with a variety of pathogenic states, including tumor formation. There is a microRNA gene located within the 5’ exon of this gene. There are pseudogenes for this gene on chromosomes 1 and 15. heat shock protein 90kDa beta family member 1 HSP90B1 ENSG00000166598 NA
226 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. aldolase, fructose-bisphosphate A ALDOA ENSG00000149925 NA
9388 The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. lipase G, endothelial type LIPG ENSG00000101670 NA
2778 This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. GNAS complex locus GNAS ENSG00000087460 NA
1508 This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. cathepsin B CTSB ENSG00000164733 NA
6652 Sorbitol dehydrogenase (SORD; EC 1.1.1.14) catalyzes the interconversion of polyols and their corresponding ketoses, and together with aldose reductase (ALDR1; MIM 103880), makes up the sorbitol pathway that is believed to play an important role in the development of diabetic complications (summarized by Carr and Markham, 1995 [PubMed 8535074]). The first reaction of the pathway (also called the polyol pathway) is the reduction of glucose to sorbitol by ALDR1 with NADPH as the cofactor. SORD then oxidizes the sorbitol to fructose using NAD(+) cofactor. sorbitol dehydrogenase SORD ENSG00000140263 NA
301 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. annexin A1 ANXA1 ENSG00000135046 NA
351 This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene. amyloid beta precursor protein APP ENSG00000142192 NA
811 Calreticulin is a multifunctional protein that acts as a major Ca(2+)-binding (storage) protein in the lumen of the endoplasmic reticulum. It is also found in the nucleus, suggesting that it may have a role in transcription regulation. Calreticulin binds to the synthetic peptide KLGFFKR, which is almost identical to an amino acid sequence in the DNA-binding domain of the superfamily of nuclear receptors. Calreticulin binds to antibodies in certain sera of systemic lupus and Sjogren patients which contain anti-Ro/SSA antibodies, it is highly conserved among species, and it is located in the endoplasmic and sarcoplasmic reticulum where it may bind calcium. The amino terminus of calreticulin interacts with the DNA-binding domain of the glucocorticoid receptor and prevents the receptor from binding to its specific glucocorticoid response element. Calreticulin can inhibit the binding of androgen receptor to its hormone-responsive DNA element and can inhibit androgen receptor and retinoic acid receptor transcriptional activities in vivo, as well as retinoic acid-induced neuronal differentiation. Thus, calreticulin can act as an important modulator of the regulation of gene transcription by nuclear hormone receptors. Systemic lupus erythematosus is associated with increased autoantibody titers against calreticulin but calreticulin is not a Ro/SS-A antigen. Earlier papers referred to calreticulin as an Ro/SS-A antigen but this was later disproven. Increased autoantibody titer against human calreticulin is found in infants with complete congenital heart block of both the IgG and IgM classes. calreticulin CALR ENSG00000179218 NA
4072 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. epithelial cell adhesion molecule EPCAM ENSG00000119888 NA
57475 NA pleckstrin homology, MyTH4 and FERM domain containing H1 PLEKHH1 ENSG00000054690 NA
7033 Members of the trefoil family are characterized by having at least one copy of the trefoil motif, a 40-amino acid domain that contains three conserved disulfides. They are stable secretory proteins expressed in gastrointestinal mucosa. Their functions are not defined, but they may protect the mucosa from insults, stabilize the mucus layer and affect healing of the epithelium. This gene is expressed in goblet cells of the intestines and colon. This gene and two other related trefoil family member genes are found in a cluster on chromosome 21. trefoil factor 3 TFF3 ENSG00000160180 NA
7170 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. tropomyosin 3 TPM3 ENSG00000143549 NA
57139 NA ral guanine nucleotide dissociation stimulator like 3 RGL3 ENSG00000205517 NA
6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. transgelin TAGLN ENSG00000149591 NA
255743 NA nephronectin NPNT ENSG00000168743 NA
9289 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. adhesion G protein-coupled receptor G1 ADGRG1 ENSG00000205336 NA
440270 NA golgin A8 family member B GOLGA8B ENSG00000215252 NA
23015 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. golgin A8 family member A GOLGA8A ENSG00000215252 NA
3712 Isovaleryl-CoA dehydrogenase (IVD) is a mitochondrial matrix enzyme that catalyzes the third step in leucine catabolism. The genetic deficiency of IVD results in an accumulation of isovaleric acid, which is toxic to the central nervous system and leads to isovaleric acidemia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. isovaleryl-CoA dehydrogenase IVD ENSG00000128928 NA
23787 This gene encodes a member of the mitochondrial carrier family. The encoded protein is localized to the mitochondrion inner membrane and induces apoptosis independent of the proapoptotic proteins Bax and Bak. Pseudogenes on chromosomes 6 and 11 have been identified for this gene. Alternatively spliced transcript variants encoding multiple isoforms have been observed. mitochondrial carrier 1 MTCH1 ENSG00000137409 NA
348 The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. apolipoprotein E APOE ENSG00000130203 NA
5037 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. phosphatidylethanolamine binding protein 1 PEBP1 ENSG00000089220 NA
2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein GFAP ENSG00000131095 NA
171024 NA synaptopodin 2 SYNPO2 ENSG00000172403 NA
27124 NA inositol polyphosphate-5-phosphatase J INPP5J ENSG00000185133 NA
375790 This gene encodes one of several proteins that are critical in the development of the neuromuscular junction (NMJ), as identified in mouse knock-out studies. The encoded protein contains several laminin G, Kazal type serine protease inhibitor, and epidermal growth factor domains. Additional post-translational modifications occur to add glycosaminoglycans and disulfide bonds. In one family with congenital myasthenic syndrome affecting limb-girdle muscles, a mutation in this gene was found. Alternative splicing results in multiple transcript variants encoding different isoforms. agrin AGRN ENSG00000188157 NA
2872 This gene encodes a member of the calcium/calmodulin-dependent protein kinases (CAMK) Ser/Thr protein kinase family, which belongs to the protein kinase superfamily. This protein contains conserved DLG (asp-leu-gly) and ENIL (glu-asn-ile-leu) motifs, and an N-terminal polybasic region which binds importin A and the translation factor scaffold protein eukaryotic initiation factor 4G (eIF4G). This protein is one of the downstream kinases activated by mitogen-activated protein (MAP) kinases. It phosphorylates the eukaryotic initiation factor 4E (eIF4E), thus playing important roles in the initiation of mRNA translation, oncogenic transformation and malignant cell proliferation. In addition to eIF4E, this protein also interacts with von Hippel-Lindau tumor suppressor (VHL), ring-box 1 (Rbx1) and Cullin2 (Cul2), which are all components of the CBC(VHL) ubiquitin ligase E3 complex. Multiple alternatively spliced transcript variants have been found, but the full-length nature and biological activity of only two variants are determined. These two variants encode distinct isoforms which differ in activity and regulation, and in subcellular localization. MAP kinase interacting serine/threonine kinase 2 MKNK2 ENSG00000099875 NA
10396 The P-type adenosinetriphosphatases (P-type ATPases) are a family of proteins which use the free energy of ATP hydrolysis to drive uphill transport of ions across membranes. Several subfamilies of P-type ATPases have been identified. One subfamily catalyzes transport of heavy metal ions. Another subfamily transports non-heavy metal ions (NMHI). The protein encoded by this gene is a member of the third subfamily of P-type ATPases and acts to transport amphipaths, such as phosphatidylserine. Two transcript variants encoding different isoforms have been found for this gene. ATPase phospholipid transporting 8A1 ATP8A1 ENSG00000124406 NA
6280 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 S100A9 ENSG00000163220 NA
4494 NA metallothionein 1F MT1F ENSG00000198417 NA
3856 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. keratin 8 KRT8 ENSG00000170421 NA
146330 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). F-box and leucine rich repeat protein 16 FBXL16 ENSG00000127585 NA
84632 NA actin filament associated protein 1 like 2 AFAP1L2 ENSG00000169129 NA
83937 The function of this gene has not yet been determined but may involve a role in tumor suppression. Alternative splicing of this gene results in several transcript variants; however, most of the variants have not been fully described. Ras association domain family member 4 RASSF4 ENSG00000107551 NA
8991 This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. selenium binding protein 1 SELENBP1 ENSG00000143416 NA
7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) TPM1 ENSG00000140416 NA
4495 NA metallothionein 1G MT1G ENSG00000125144 NA
100129518 NA uncharacterized LOC100129518 LOC100129518 ENSG00000112096 NA
6648 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial SOD2 ENSG00000112096 NA
126393 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation. heat shock protein family B (small) member 6 HSPB6 ENSG00000004776 NA
3798 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. kinesin family member 5A KIF5A ENSG00000155980 NA
283120 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. H19, imprinted maternally expressed transcript (non-protein coding) H19 ENSG00000130600 NA
2057 This gene encodes the erythropoietin receptor which is a member of the cytokine receptor family. Upon erythropoietin binding, this receptor activates Jak2 tyrosine kinase which activates different intracellular pathways including: Ras/MAP kinase, phosphatidylinositol 3-kinase and STAT transcription factors. The stimulated erythropoietin receptor appears to have a role in erythroid cell survival. Defects in the erythropoietin receptor may produce erythroleukemia and familial erythrocytosis. Dysregulation of this gene may affect the growth of certain tumors. Alternate splicing results in multiple transcript variants. erythropoietin receptor EPOR ENSG00000187266 NA
83959 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. solute carrier family 4 member 11 SLC4A11 ENSG00000088836 NA
114822 NA rhophilin, Rho GTPase binding protein 1 RHPN1 ENSG00000158106 NA
65108 This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. MARCKS like 1 MARCKSL1 ENSG00000175130 NA
91522 COL23A1 is a member of the transmembrane collagens, a subfamily of the nonfibrillar collagens that contain a single pass hydrophobic transmembrane domain (Banyard et al., 2003 [PubMed 12644459]). collagen type XXIII alpha 1 chain COL23A1 ENSG00000050767 NA
83483 NA plasmalemma vesicle associated protein PLVAP ENSG00000130300 NA
81618 NA integral membrane protein 2C ITM2C ENSG00000135916 NA
116496 NA family with sequence similarity 129 member A FAM129A ENSG00000135842 NA
1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 1 COL1A1 ENSG00000108821 NA
2355 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. FOS like 2, AP-1 transcription factor subunit FOSL2 ENSG00000075426 NA
25849 NA prostate androgen-regulated mucin-like protein 1 PARM1 ENSG00000169116 NA
6366 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). C-C motif chemokine ligand 21 CCL21 ENSG00000137077 NA
10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. myosin light chain 9 MYL9 ENSG00000101335 NA
3710 This gene encodes a receptor for inositol 1,4,5-trisphosphate, a second messenger that mediates the release of intracellular calcium. The receptor contains a calcium channel at the C-terminus and the ligand-binding site at the N-terminus. Knockout studies in mice suggest that type 2 and type 3 inositol 1,4,5-trisphosphate receptors play a key role in exocrine secretion underlying energy metabolism and growth. inositol 1,4,5-trisphosphate receptor type 3 ITPR3 ENSG00000096433 NA
6383 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-2 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-2 expression has been detected in several different tumor types. syndecan 2 SDC2 ENSG00000169439 NA
50861 This gene encodes a protein which is a member of the stathmin protein family. Members of this protein family form a complex with tubulins at a ratio of 2 tubulins for each stathmin protein. Microtubules require the ordered assembly of alpha- and beta-tubulins, and formation of a complex with stathmin disrupts microtubule formation and function. A pseudogene of this gene is located on chromosome 22. Alternative splicing results in multiple transcript variants. stathmin 3 STMN3 ENSG00000197457 NA
5166 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. pyruvate dehydrogenase kinase 4 PDK4 ENSG00000004799 NA
567 This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. beta-2-microglobulin B2M ENSG00000166710 NA
4162 NA melanoma cell adhesion molecule MCAM ENSG00000076706 NA
1281 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type III alpha 1 chain COL3A1 ENSG00000168542 NA
11170 NA family with sequence similarity 107 member A FAM107A ENSG00000168309 NA
7094 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. talin 1 TLN1 ENSG00000137076 NA
7422 This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. vascular endothelial growth factor A VEGFA ENSG00000112715 NA
3400 This gene encodes a member of the inhibitor of DNA binding (ID) protein family. These proteins are basic helix-loop-helix transcription factors which can act as tumor suppressors but lack DNA binding activity. Consequently, the activity of the encoded protein depends on the protein binding partner. inhibitor of DNA binding 4, HLH protein ID4 ENSG00000172201 NA
5121 NA Purkinje cell protein 4 PCP4 ENSG00000183036 NA
116984 The protein encoded by this gene contains ARF-GAP, RHO-GAP, ankyrin repeat, RAS-associating, and pleckstrin homology domains. The protein is a phosphatidylinositol (3,4,5)-trisphosphate-dependent Arf6 GAP that binds RhoA-GTP, but it lacks the predicted catalytic arginine in the RHO-GAP domain and does not have RHO-GAP activity. The protein associates with focal adhesions and functions downstream of RhoA to regulate focal adhesion dynamics. ArfGAP with RhoGAP domain, ankyrin repeat and PH domain 2 ARAP2 ENSG00000047365 NA
1893 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. extracellular matrix protein 1 ECM1 ENSG00000143369 NA
1278 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 2 chain COL1A2 ENSG00000164692 NA
ENSG00000273149 NA NA RP11-290D2.6 ENSG00000273149 NA
6642 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This endosomal protein regulates the cell-surface expression of epidermal growth factor receptor. This protein also has a role in sorting protease-activated receptor-1 from early endosomes to lysosomes. This protein may form oligomeric complexes with family members. This gene results in three transcript variants encoding distinct isoforms. sorting nexin 1 SNX1 ENSG00000028528 NA
146223 This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and the transmembrane 4 superfamilies of signaling molecules. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 16. Alternatively spliced transcript variants encoding different isoforms have been identified. CKLF like MARVEL transmembrane domain containing 4 CMTM4 ENSG00000183723 NA
1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin DCN ENSG00000011465 NA
1284 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. collagen type IV alpha 2 COL4A2 ENSG00000134871 NA
59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta ACTA2 ENSG00000107796 NA
4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta MYH7 ENSG00000092054 NA
4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. myelin basic protein MBP ENSG00000197971 NA
72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric ACTG2 ENSG00000163017 NA
6525 This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. smoothelin SMTN ENSG00000183963 NA
57648 NA KIAA1522 KIAA1522 ENSG00000162522 NA
10406 This gene encodes a protein that is a member of the WFDC domain family. The WFDC domain, or WAP Signature motif, contains eight cysteines forming four disulfide bonds at the core of the protein, and functions as a protease inhibitor in many family members. This gene is expressed in pulmonary epithelial cells, and was also found to be expressed in some ovarian cancers. The encoded protein is a small secretory protein, which may be involved in sperm maturation. WAP four-disulfide core domain 2 WFDC2 ENSG00000101443 NA
10580 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. sorbin and SH3 domain containing 1 SORBS1 ENSG00000095637 NA
5493 The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. periplakin PPL ENSG00000118898 NA
4666 This gene encodes a protein that associates with basic transcription factor 3 (BTF3) to form the nascent polypeptide-associated complex (NAC). This complex binds to nascent proteins that lack a signal peptide motif as they emerge from the ribosome, blocking interaction with the signal recognition particle (SRP) and preventing mistranslocation to the endoplasmic reticulum. This protein is an IgE autoantigen in atopic dermatitis patients. Alternative splicing results in multiple transcript variants, but the full length nature of some of these variants, including those encoding very large proteins, has not been determined. There are multiple pseudogenes of this gene on different chromosomes. nascent polypeptide-associated complex alpha subunit NACA ENSG00000196531 NA
10160 This gene encodes a protein containing a FERM (4.2, exrin, radixin, moesin) domain, a Dbl homology domain, and two pleckstrin homology domains. These domains are found in guanine nucleotide exchange factors and proteins that link the cytoskeleton to the cell membrane. The encoded protein functions in neurons to promote dendritic growth. Alternative splicing results in multiple transcript variants. FERM, ARH/RhoGEF and pleckstrin domain protein 1 FARP1 ENSG00000152767 NA
6238 This gene encodes a ribosome-binding protein of the endoplasmic reticulum (ER) membrane. Studies suggest that this gene plays a role in ER proliferation, secretory pathways and secretory cell differentiation, and mediation of ER-microtubule interactions. Alternative splicing has been observed and protein isoforms are characterized by regions of N-terminal decapeptide and C-terminal heptad repeats. Splicing of the tandem repeats results in variations in ribosome-binding affinity and secretory function. The full-length nature of variants which differ in repeat length has not been determined. Pseudogenes of this gene have been identified on chromosomes 3 and 7, and RRBP1 has been excluded as a candidate gene in the cause of Alagille syndrome, the result of a mutation in a nearby gene on chromosome 20p12. ribosome binding protein 1 RRBP1 ENSG00000125844 NA
23015 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. golgin A8 family member A GOLGA8A ENSG00000175265 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
NRGN 4900 neurogranin ENSG00000154146 Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. NA
KIF5A 3798 kinesin family member 5A ENSG00000155980 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. NA
KRT10 3858 keratin 10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. NA
VIM 7431 vimentin ENSG00000026025 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. NA
KRT1 3848 keratin 1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
GAPDH 2597 glyceraldehyde-3-phosphate dehydrogenase ENSG00000111640 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. NA
ABLIM1 3983 actin binding LIM protein 1 ENSG00000099204 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
PKD1 5310 polycystin 1, transient receptor potential channel interacting ENSG00000008710 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. NA
GPX3 2878 glutathione peroxidase 3 ENSG00000211445 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. NA
MDGA1 266727 MAM domain containing glycosylphosphatidylinositol anchor 1 ENSG00000112139 NA NA
FBXL16 146330 F-box and leucine rich repeat protein 16 ENSG00000127585 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). NA
CBLN3 643866 cerebellin 3 precursor ENSG00000139899 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). NA
CTXN1 404217 cortexin 1 ENSG00000178531 NA NA
KRT2 3849 keratin 2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
ENC1 8507 ectodermal-neural cortex 1 ENSG00000171617 This gene encodes a member of the kelch-related family of actin-binding proteins. The encoded protein plays a role in the oxidative stress response as a regulator of the transcription factor Nrf2, and expression of this gene may play a role in malignant transformation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
PSD 5662 pleckstrin and Sec7 domain containing ENSG00000059915 This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. NA
RP5-940J5.9 ENSG00000269968 NA ENSG00000269968 NA NA
CHGB 1114 chromogranin B ENSG00000089199 This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. NA
TIMP2 7077 TIMP metallopeptidase inhibitor 2 ENSG00000035862 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. NA
CHN1 1123 chimerin 1 ENSG00000128656 This gene encodes GTPase-activating protein for ras-related p21-rac and a phorbol ester receptor. It is predominantly expressed in neurons, and plays an important role in neuronal signal-transduction mechanisms. Mutations in this gene are associated with Duane’s retraction syndrome 2 (DURS2). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
PERP 64065 PERP, TP53 apoptosis effector ENSG00000112378 NA NA
DSP 1832 desmoplakin ENSG00000096696 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. NA
NA NA NA ENSG00000163486 NA TRUE
SFTPB 6439 surfactant protein B ENSG00000168878 This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. NA
ACTB 60 actin, beta ENSG00000075624 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. NA
TIAM1 7074 T-cell lymphoma invasion and metastasis 1 ENSG00000156299 NA NA
ITM2C 81618 integral membrane protein 2C ENSG00000135916 NA NA
CADPS2 93664 Ca2+ dependent secretion activator 2 ENSG00000081803 This gene encodes a member of the calcium-dependent activator of secretion (CAPS) protein family, which are calcium binding proteins that regulate the exocytosis of synaptic and dense-core vesicles in neurons and neuroendocrine cells. Mutations in this gene may contribute to autism susceptibility. Multiple transcript variants encoding different isoforms have been found for this gene. NA
MYH6 4624 myosin, heavy chain 6, cardiac muscle, alpha ENSG00000197616 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. NA
GNAS 2778 GNAS complex locus ENSG00000087460 This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. NA
SOGA1 140710 suppressor of glucose, autophagy associated 1 ENSG00000149639 NA NA
SFTPA2 729238 surfactant protein A2 ENSG00000185303 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. NA
CHGA 1113 chromogranin A ENSG00000100604 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. NA
TSPAN9 10867 tetraspanin 9 ENSG00000011105 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. Alternatively spliced transcripts encoding the same protein have been identified. NA
VIM-AS1 100507347 VIM antisense RNA 1 ENSG00000229124 NA NA
NA NA NA ENSG00000117289 NA TRUE
MICAL2 9645 microtubule associated monooxygenase, calponin and LIM domain containing 2 ENSG00000133816 NA NA
FKBP5 2289 FK506 binding protein 5 ENSG00000096060 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. NA
KRT14 3861 keratin 14 ENSG00000186847 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. NA
MCF2L 23263 MCF.2 cell line derived transforming sequence like ENSG00000126217 This gene encodes a guanine nucleotide exchange factor that interacts specifically with the GTP-bound Rac1 and plays a role in the Rho/Rac signaling pathways. A variant in this gene was associated with osteoarthritis. Alternative splicing results in multiple transcript variants. NA
PPFIA4 8497 PTPRF interacting protein alpha 4 ENSG00000143847 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. NA
TLE2 7089 transducin like enhancer of split 2 ENSG00000065717 NA NA
SFTPC 6440 surfactant protein C ENSG00000168484 This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. NA
SFTPA1 653509 surfactant protein A1 ENSG00000122852 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. NA
RP11-124N14.3 ENSG00000234961 NA ENSG00000234961 NA NA
ERRFI1 54206 ERBB receptor feedback inhibitor 1 ENSG00000116285 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). NA
FKBP8 23770 FK506 binding protein 8 ENSG00000105701 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. Unlike the other members of the family, this encoded protein does not seem to have PPIase/rotamase activity. It may have a role in neurons associated with memory function. NA
PKM 5315 pyruvate kinase, muscle ENSG00000067225 This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. NA
THY1 7070 Thy-1 cell surface antigen ENSG00000154096 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. NA
NPPA 4878 natriuretic peptide A ENSG00000175206 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NA
FASN 2194 fatty acid synthase ENSG00000169710 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. NA
ALS2 57679 ALS2, alsin Rho guanine nucleotide exchange factor ENSG00000003393 The protein encoded by this gene contains an ATS1/RCC1-like domain, a RhoGEF domain, and a vacuolar protein sorting 9 (VPS9) domain, all of which are guanine-nucleotide exchange factors that activate members of the Ras superfamily of GTPases. The protein functions as a guanine nucleotide exchange factor for the small GTPase RAB5. The protein localizes with RAB5 on early endosomal compartments, and functions as a modulator for endosomal dynamics. Mutations in this gene result in several forms of juvenile lateral sclerosis and infantile-onset ascending spastic paralysis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
HSP90AA1 3320 heat shock protein 90kDa alpha family class A member 1 ENSG00000080824 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. NA
COL27A1 85301 collagen type XXVII alpha 1 ENSG00000196739 This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. NA
SBSN 374897 suprabasin ENSG00000189001 NA NA
FOXN3 1112 forkhead box N3 ENSG00000053254 This gene is a member of the forkhead/winged helix transcription factor family. Checkpoints are eukaryotic DNA damage-inducible cell cycle arrests at G1 and G2. Checkpoint suppressor 1 suppresses multiple yeast checkpoint mutations including mec1, rad9, rad53 and dun1 by activating a MEC1-independent checkpoint pathway. Alternative splicing is observed at the locus, resulting in distinct isoforms. NA
BHLHE40 8553 basic helix-loop-helix family member e40 ENSG00000134107 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. NA
PDK4 5166 pyruvate dehydrogenase kinase 4 ENSG00000004799 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. NA
ITGA5 3678 integrin subunit alpha 5 ENSG00000161638 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. NA
BAIAP3 8938 BAI1 associated protein 3 ENSG00000007516 This p53-target gene encodes a brain-specific angiogenesis inhibitor. The protein is a seven-span transmembrane protein and a member of the secretin receptor family. It interacts with the cytoplasmic region of brain-specific angiogenesis inhibitor 1. This protein also contains two C2 domains, which are often found in proteins involved in signal transduction or membrane trafficking. Its expression pattern and similarity to other proteins suggest that it may be involved in synaptic functions. Several transcript variants encoding different isoforms have been found for this gene. NA
LAMA5 3911 laminin subunit alpha 5 ENSG00000130702 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). NA
PATJ 10207 PATJ, crumbs cell polarity complex component ENSG00000132849 This gene encodes a protein with multiple PDZ domains. PDZ domains mediate protein-protein interactions, and proteins with multiple PDZ domains often organize multimeric complexes at the plasma membrane. This protein localizes to tight junctions and to the apical membrane of epithelial cells. A similar protein in Drosophila is a scaffolding protein which tethers several members of a multimeric signaling complex in photoreceptors. NA
TGM2 7052 transglutaminase 2 ENSG00000198959 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. NA
PKIB 5570 protein kinase (cAMP-dependent, catalytic) inhibitor beta ENSG00000135549 This gene encodes a member of the cAMP-dependent protein kinase inhibitor family. The encoded protein may play a role in the protein kinase A (PKA) pathway by interacting with the catalytic subunit of PKA, and overexpression of this gene may play a role in prostate cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
OAZ1 4946 ornithine decarboxylase antizyme 1 ENSG00000104904 The protein encoded by this gene belongs to the ornithine decarboxylase antizyme family, which plays a role in cell growth and proliferation by regulating intracellular polyamine levels. Expression of antizymes requires +1 ribosomal frameshifting, which is enhanced by high levels of polyamines. Antizymes in turn bind to and inhibit ornithine decarboxylase (ODC), the key enzyme in polyamine biosynthesis; thus, completing the auto-regulatory circuit. This gene encodes antizyme 1, the first member of the antizyme family, that has broad tissue distribution, and negatively regulates intracellular polyamine levels by binding to and targeting ODC for degradation, as well as inhibiting polyamine uptake. Antizyme 1 mRNA contains two potential in-frame AUGs; and studies in rat suggest that alternative use of the two translation initiation sites results in N-terminally distinct protein isoforms with different subcellular localization. Alternatively spliced transcript variants have also been noted for this gene. NA
DMKN 93099 dermokine ENSG00000161249 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. NA
CAMK2N1 55450 calcium/calmodulin dependent protein kinase II inhibitor 1 ENSG00000162545 NA NA
CHD7 55636 chromodomain helicase DNA binding protein 7 ENSG00000171316 This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. NA
CALML5 51806 calmodulin like 5 ENSG00000178372 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. NA
PDIA2 64714 protein disulfide isomerase family A member 2 ENSG00000185615 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
PKP1 5317 plakophilin 1 ENSG00000081277 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. NA
ALDOA 226 aldolase, fructose-bisphosphate A ENSG00000149925 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. NA
PRRC2A 7916 proline rich coiled-coil 2A ENSG00000204469 A cluster of genes, BAT1-BAT5, has been localized in the vicinity of the genes for TNF alpha and TNF beta. These genes are all within the human major histocompatibility complex class III region. This gene has microsatellite repeats which are associated with the age-at-onset of insulin-dependent diabetes mellitus (IDDM) and possibly thought to be involved with the inflammatory process of pancreatic beta-cell destruction during the development of IDDM. This gene is also a candidate gene for the development of rheumatoid arthritis. Two transcript variants encoding the same protein have been found for this gene. NA
SLC38A1 81539 solute carrier family 38 member 1 ENSG00000111371 Amino acid transporters play essential roles in the uptake of nutrients, production of energy, chemical metabolism, detoxification, and neurotransmitter cycling. SLC38A1 is an important transporter of glutamine, an intermediate in the detoxification of ammonia and the production of urea. Glutamine serves as a precursor for the synaptic transmitter, glutamate (Gu et al., 2001 [PubMed 11325958]). NA
ADARB1 104 adenosine deaminase, RNA specific B1 ENSG00000197381 This gene encodes the enzyme responsible for pre-mRNA editing of the glutamate receptor subunit B by site-specific deamination of adenosines. Studies in rat found that this enzyme acted on its own pre-mRNA molecules to convert an AA dinucleotide to an AI dinucleotide which resulted in a new splice site. Alternative splicing of this gene results in several transcript variants, some of which have been characterized by the presence or absence of an ALU cassette insert and a short or long C-terminal region. NA
HSPB7 27129 heat shock protein family B (small) member 7 ENSG00000173641 NA NA
TPM3 7170 tropomyosin 3 ENSG00000143549 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. NA
UCHL1 7345 ubiquitin C-terminal hydrolase L1 ENSG00000154277 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. NA
LOR 4014 loricrin ENSG00000203782 This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. NA
CTC-251D13.1 ENSG00000271795 NA ENSG00000271795 NA NA
TNK2 10188 tyrosine kinase non receptor 2 ENSG00000061938 This gene encodes a tyrosine kinase that binds Cdc42Hs in its GTP-bound form and inhibits both the intrinsic and GTPase-activating protein (GAP)-stimulated GTPase activity of Cdc42Hs. This binding is mediated by a unique sequence of 47 amino acids C-terminal to an SH3 domain. The protein may be involved in a regulatory mechanism that sustains the GTP-bound active form of Cdc42Hs and which is directly linked to a tyrosine phosphorylation signal transduction pathway. Several alternatively spliced transcript variants have been identified from this gene, but the full-length nature of only two transcript variants has been determined. NA
RGS14 10636 regulator of G-protein signaling 14 ENSG00000169220 This gene encodes a member of the regulator of G-protein signaling family. This protein contains one RGS domain, two Raf-like Ras-binding domains (RBDs), and one GoLoco domain. The protein attenuates the signaling activity of G-proteins by binding, through its GoLoco domain, to specific types of activated, GTP-bound G alpha subunits. Acting as a GTPase activating protein (GAP), the protein increases the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. Alternate transcriptional splice variants of this gene have been observed but have not been thoroughly characterized. NA
FBRS 64319 fibrosin ENSG00000156860 Fibrosin is a lymphokine secreted by activated lymphocytes that induces fibroblast proliferation (Prakash and Robbins, 1998 [PubMed 9809749]). NA
MAST3 23031 microtubule associated serine/threonine kinase 3 ENSG00000099308 NA NA
CLIP1 6249 CAP-Gly domain containing linker protein 1 ENSG00000130779 The protein encoded by this gene links endocytic vesicles to microtubules. This gene is highly expressed in Reed-Sternberg cells of Hodgkin disease. Several transcript variants encoding different isoforms have been found for this gene. NA
MT3 4504 metallothionein 3 ENSG00000087250 NA NA
KLF9 687 Kruppel like factor 9 ENSG00000119138 The protein encoded by this gene is a transcription factor that binds to GC box elements located in the promoter. Binding of the encoded protein to a single GC box inhibits mRNA expression while binding to tandemly repeated GC box elements activates transcription. NA
LDLRAP1 26119 low density lipoprotein receptor adaptor protein 1 ENSG00000157978 The protein encoded by this gene is a cytosolic protein which contains a phosphotyrosine binding (PTD) domain. The PTD domain has been found to interact with the cytoplasmic tail of the LDL receptor. Mutations in this gene lead to LDL receptor malfunction and cause the disorder autosomal recessive hypercholesterolaemia. NA
ELN 2006 elastin ENSG00000049540 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. NA
ATHL1 80162 ATH1, acid trehalase-like 1 (yeast) ENSG00000142102 NA NA
FXYD7 53822 FXYD domain containing ion transport regulator 7 ENSG00000221946 This reference sequence was derived from multiple replicate ESTs and validated by similar human genomic sequence. This gene encodes a member of a family of small membrane proteins that share a 35-amino acid signature sequence domain, beginning with the sequence PFXYD and containing 7 invariant and 6 highly conserved amino acids. The approved human gene nomenclature for the family is FXYD-domain containing ion transport regulator. Transmembrane topology has been established for two family members (FXYD1 and FXYD2), with the N-terminus extracellular and the C-terminus on the cytoplasmic side of the membrane. FXYD2, also known as the gamma subunit of the Na,K-ATPase, regulates the properties of that enzyme. FXYD1 (phospholemman), FXYD2 (gamma), FXYD3 (MAT-8), FXYD4 (CHIF), and FXYD5 (RIC) have been shown to induce channel activity in experimental expression systems. This gene product, FXYD7, is novel and has not been characterized as a protein. [RefSeq curation by Kathleen J. Sweadner, Ph.D., sweadner@helix.mgh.harvard.edu., Dec 2000]. NA
CALM2 805 calmodulin 2 (phosphorylase kinase, delta) ENSG00000143933 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
HLA-B 3106 major histocompatibility complex, class I, B ENSG00000234745 HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. NA
MRVI1 10335 murine retrovirus integration site 1 homolog ENSG00000072952 This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. NA
IFITM3 10410 interferon induced transmembrane protein 3 ENSG00000142089 The protein encoded by this gene is an interferon-induced membrane protein that helps confer immunity to influenza A H1N1 virus, West Nile virus, and dengue virus. Two transcript variants, only one of them protein-coding, have been found for this gene. Another variant encoding an N-terminally truncated isoform has been reported, but the full-length nature of this variant has not been determined. NA
MT1X 4501 metallothionein 1X ENSG00000187193 NA NA
TMEM178A 130733 transmembrane protein 178A ENSG00000152154 NA NA
ICAM5 7087 intercellular adhesion molecule 5 ENSG00000105376 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is expressed on the surface of telencephalic neurons and displays two types of adhesion activity, homophilic binding between neurons and heterophilic binding between neurons and leukocytes. It may be a critical component in neuron-microglial cell interactions in the course of normal development or as part of neurodegenerative diseases. NA
BSG 682 basigin (Ok blood group) ENSG00000172270 The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. NA
COL6A2 1292 collagen type VI alpha 2 ENSG00000142173 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol summary query name
7431 VIM This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. ENSG00000026025 vimentin
1674 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. ENSG00000175084 desmin
1292 COL6A2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2
5620 PRM2 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. ENSG00000122304 protamine 2
3858 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. ENSG00000186395 keratin 10
2512 FTL This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ENSG00000087086 ferritin, light polypeptide
4624 MYH6 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. ENSG00000197616 myosin, heavy chain 6, cardiac muscle, alpha
1832 DSP This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. ENSG00000096696 desmoplakin
64065 PERP NA ENSG00000112378 PERP, TP53 apoptosis effector
5619 PRM1 NA ENSG00000175646 protamine 1
1634 DCN This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin
3043 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. ENSG00000244734 hemoglobin subunit beta
5317 PKP1 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000081277 plakophilin 1
2192 FBLN1 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. ENSG00000077942 fibulin 1
1281 COL3A1 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000168542 collagen type III alpha 1 chain
1277 COL1A1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1
4878 NPPA The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. ENSG00000175206 natriuretic peptide A
3861 KRT14 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. ENSG00000186847 keratin 14
3852 KRT5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000186081 keratin 5
3491 CYR61 The secreted protein encoded by this gene is growth factor-inducible and promotes the adhesion of endothelial cells. The encoded protein interacts with several integrins and with heparan sulfate proteoglycan. This protein also plays a role in cell proliferation, differentiation, angiogenesis, apoptosis, and extracellular matrix formation. ENSG00000142871 cysteine rich angiogenic inducer 61
3728 JUP This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. ENSG00000173801 junction plakoglobin
374897 SBSN NA ENSG00000189001 suprabasin
100507347 VIM-AS1 NA ENSG00000229124 VIM antisense RNA 1
1293 COL6A3 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. ENSG00000163359 collagen type VI alpha 3 chain
4256 MGP The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. ENSG00000111341 matrix Gla protein
ENSG00000237973 MTCO1P12 NA ENSG00000237973 MT-CO1 pseudogene 12
3849 KRT2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000172867 keratin 2
710 SERPING1 This gene encodes a highly glycosylated plasma protein involved in the regulation of the complement cascade. Its protein inhibits activated C1r and C1s of the first complement component and thus regulates complement activation. Deficiency of this protein is associated with hereditary angioneurotic oedema (HANE). Alternative splicing results in multiple transcript variants encoding the same isoform. ENSG00000149131 serpin family G member 1
2318 FLNC This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000128591 filamin C
3848 KRT1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1
60 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ENSG00000075624 actin, beta
ENSG00000234961 RP11-124N14.3 NA ENSG00000234961 NA
11167 FSTL1 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. ENSG00000163430 follistatin like 1
7430 EZR The cytoplasmic peripheral membrane protein encoded by this gene functions as a protein-tyrosine kinase substrate in microvilli. As a member of the ERM protein family, this protein serves as an intermediate between the plasma membrane and the actin cytoskeleton. This protein plays a key role in cell surface structure adhesion, migration and organization, and it has been implicated in various human cancers. A pseudogene located on chromosome 3 has been identified for this gene. Alternatively spliced variants have also been described for this gene. ENSG00000092820 ezrin
284119 PTRF This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. ENSG00000177469 polymerase I and transcript release factor
70 ACTC1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ENSG00000159251 actin, alpha, cardiac muscle 1
27129 HSPB7 NA ENSG00000173641 heat shock protein family B (small) member 7
3487 IGFBP4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. ENSG00000141753 insulin like growth factor binding protein 4
4151 MB This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. ENSG00000198125 myoglobin
6712 SPTBN2 Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements. ENSG00000173898 spectrin beta, non-erythrocytic 2
ENSG00000225630 MTND2P28 NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28
6280 S100A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. ENSG00000163220 S100 calcium binding protein A9
715 C1R NA ENSG00000159403 complement C1r subcomponent
ENSG00000211895 IGHA1 NA ENSG00000211895 immunoglobulin heavy constant alpha 1
1410 CRYAB Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. ENSG00000109846 crystallin alpha B
2335 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1
7531 YWHAE This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the mouse ortholog. It interacts with CDC25 phosphatases, RAF1 and IRS1 proteins, suggesting its role in diverse biochemical activities related to signal transduction, such as cell division and regulation of insulin sensitivity. It has also been implicated in the pathogenesis of small cell lung cancer. Two transcript variants, one protein-coding and the other non-protein-coding, have been found for this gene. ENSG00000108953 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein epsilon
4313 MMP2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000087245 matrix metallopeptidase 2
57447 NDRG2 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. ENSG00000165795 NDRG family member 2
718 C3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. ENSG00000125730 complement component 3
5159 PDGFRB This gene encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. These growth factors are mitogens for cells of mesenchymal origin. The identity of the growth factor bound to a receptor monomer determines whether the functional receptor is a homodimer or a heterodimer, composed of both platelet-derived growth factor receptor alpha and beta polypeptides. This gene is flanked on chromosome 5 by the genes for granulocyte-macrophage colony-stimulating factor and macrophage-colony stimulating factor receptor; all three genes may be implicated in the 5-q syndrome. A translocation between chromosomes 5 and 12, that fuses this gene to that of the translocation, ETV6, leukemia gene, results in chronic myeloproliferative disorder with eosinophilia. ENSG00000113721 platelet derived growth factor receptor beta
100129518 LOC100129518 NA ENSG00000112096 uncharacterized LOC100129518
6648 SOD2 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. ENSG00000112096 superoxide dismutase 2, mitochondrial
1490 CTGF The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. ENSG00000118523 connective tissue growth factor
716 C1S This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. ENSG00000182326 complement component 1, s subcomponent
7139 TNNT2 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. ENSG00000118194 troponin T2, cardiac type
3860 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. ENSG00000171401 keratin 13
146330 FBXL16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). ENSG00000127585 F-box and leucine rich repeat protein 16
1303 COL12A1 This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000111799 collagen type XII alpha 1 chain
11155 LDB3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. ENSG00000122367 LIM domain binding 3
800 CALD1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. ENSG00000122786 caldesmon 1
1284 COL4A2 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. ENSG00000134871 collagen type IV alpha 2
3678 ITGA5 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. ENSG00000161638 integrin subunit alpha 5
3040 HBA2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000188536 hemoglobin subunit alpha 2
151887 CCDC80 NA ENSG00000091986 coiled-coil domain containing 80
84033 OBSCN The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000154358 obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF
6279 S100A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000143546 S100 calcium binding protein A8
1278 COL1A2 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000164692 collagen type I alpha 2 chain
3557 IL1RN The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. ENSG00000136689 interleukin 1 receptor antagonist
23650 TRIM29 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. ENSG00000137699 tripartite motif containing 29
7273 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. ENSG00000155657 titin
7538 ZFP36 NA ENSG00000128016 ZFP36 ring finger protein
6876 TAGLN The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. ENSG00000149591 transgelin
7045 TGFBI This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. ENSG00000120708 transforming growth factor beta induced
3339 HSPG2 This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. ENSG00000142798 heparan sulfate proteoglycan 2
730 C7 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. ENSG00000112936 complement component 7
65018 PINK1 This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. ENSG00000158828 PTEN induced putative kinase 1
4627 MYH9 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. ENSG00000100345 myosin, heavy chain 9, non-muscle
3312 HSPA8 This gene encodes a member of the heat shock protein 70 family, which contains both heat-inducible and constitutively expressed members. This protein belongs to the latter group, which are also referred to as heat-shock cognate proteins. It functions as a chaperone, and binds to nascent polypeptides to facilitate correct folding. It also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000109971 heat shock protein family A (Hsp70) member 8
23095 KIF1B This gene encodes a motor protein that transports mitochondria and synaptic vesicle precursors. Mutations in this gene cause Charcot-Marie-Tooth disease, type 2A1. ENSG00000054523 kinesin family member 1B
7094 TLN1 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. ENSG00000137076 talin 1
27063 ANKRD1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ENSG00000148677 ankyrin repeat domain 1
5037 PEBP1 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. ENSG00000089220 phosphatidylethanolamine binding protein 1
4604 MYBPC1 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000196091 myosin binding protein C, slow type
79901 CYBRD1 This gene is a member of the cytochrome b(561) family that encodes an iron-regulated protein. It highly expressed in the duodenal brush border membrane. It has ferric reductase activity and is believed to play a physiological role in dietary iron absorption. ENSG00000071967 cytochrome b reductase 1
58498 MYL7 NA ENSG00000106631 myosin light chain 7
6277 S100A6 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in stimulation of Ca2+-dependent insulin release, stimulation of prolactin secretion, and exocytosis. Chromosomal rearrangements and altered expression of this gene have been implicated in melanoma. ENSG00000197956 S100 calcium binding protein A6
10398 MYL9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000101335 myosin light chain 9
5493 PPL The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. ENSG00000118898 periplakin
4625 MYH7 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. ENSG00000092054 myosin, heavy chain 7, cardiac muscle, beta
7048 TGFBR2 This gene encodes a member of the Ser/Thr protein kinase family and the TGFB receptor subfamily. The encoded protein is a transmembrane protein that has a protein kinase domain, forms a heterodimeric complex with another receptor protein, and binds TGF-beta. This receptor/ligand complex phosphorylates proteins, which then enter the nucleus and regulate the transcription of a subset of genes related to cell proliferation. Mutations in this gene have been associated with Marfan Syndrome, Loeys-Deitz Aortic Aneurysm Syndrome, and the development of various types of tumors. Alternatively spliced transcript variants encoding different isoforms have been characterized. ENSG00000163513 transforming growth factor beta receptor 2
857 CAV1 The scaffolding protein encoded by this gene is the main component of the caveolae plasma membranes found in most cell types. The protein links integrin subunits to the tyrosine kinase FYN, an initiating step in coupling integrins to the Ras-ERK pathway and promoting cell cycle progression. The gene is a tumor suppressor gene candidate and a negative regulator of the Ras-p42/44 mitogen-activated kinase cascade. Caveolin 1 and caveolin 2 are located next to each other on chromosome 7 and express colocalizing proteins that form a stable hetero-oligomeric complex. Mutations in this gene have been associated with Berardinelli-Seip congenital lipodystrophy. Alternatively spliced transcripts encode alpha and beta isoforms of caveolin 1. ENSG00000105974 caveolin 1
65009 NDRG4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. ENSG00000103034 NDRG family member 4
ENSG00000211896 IGHG1 NA ENSG00000211896 immunoglobulin heavy constant gamma 1 (G1m marker)
3326 HSP90AB1 This gene encodes a member of the heat shock protein 90 family; these proteins are involved in signal transduction, protein folding and degradation and morphological evolution. This gene encodes the constitutive form of the cytosolic 90 kDa heat-shock protein and is thought to play a role in gastric apoptosis and inflammation. Alternative splicing results in multiple transcript variants. Pseudogenes have been identified on multiple chromosomes. ENSG00000096384 heat shock protein 90kDa alpha family class B member 1
2202 EFEMP1 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. ENSG00000115380 EGF containing fibulin like extracellular matrix protein 1
5502 PPP1R1A NA ENSG00000135447 protein phosphatase 1 regulatory inhibitor subunit 1A
79085 SLC25A23 NA ENSG00000125648 solute carrier family 25 member 23
84525 HOPX The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. ENSG00000171476 HOP homeobox
6515 SLC2A3 NA ENSG00000059804 solute carrier family 2 member 3
3320 HSP90AA1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000080824 heat shock protein 90kDa alpha family class A member 1
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 DES desmin ENSG00000175084
The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 HBB hemoglobin subunit beta ENSG00000244734
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88 ACTN2 actinin alpha 2 ENSG00000077522
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 KRT13 keratin 13 ENSG00000171401
Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 MYH7 myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054
This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. 488 ATP2A2 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 ENSG00000174437
The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ACTA1 actin, alpha 1, skeletal muscle ENSG00000143632
This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 DCN decorin ENSG00000011465
This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 TTN titin ENSG00000155657
The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 CKM creatine kinase, M-type ENSG00000104879
This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 FLNC filamin C ENSG00000128591
This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 MB myoglobin ENSG00000198125
This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ACTB actin, beta ENSG00000075624
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 FN1 fibronectin 1 ENSG00000115414
NA 101927055 LOC101927055 uncharacterized LOC101927055 ENSG00000237298
NA 100506866 TTN-AS1 TTN antisense RNA 1 ENSG00000237298
This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 MYBPC1 myosin binding protein C, slow type ENSG00000196091
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 KRT4 keratin 4 ENSG00000170477
NA 8531 YBX3 Y-box binding protein 3 ENSG00000060138
The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. 226 ALDOA aldolase, fructose-bisphosphate A ENSG00000149925
This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. 2027 ENO3 enolase 3 ENSG00000108515
The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 NPPA natriuretic peptide A ENSG00000175206
This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. 291 SLC25A4 solute carrier family 25 member 4 ENSG00000151729
Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 ACTC1 actin, alpha, cardiac muscle 1 ENSG00000159251
The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. 9172 MYOM2 myomesin 2 ENSG00000036448
NA 100129518 LOC100129518 uncharacterized LOC100129518 ENSG00000112096
This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. 6648 SOD2 superoxide dismutase 2, mitochondrial ENSG00000112096
NA 27129 HSPB7 heat shock protein family B (small) member 7 ENSG00000173641
This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. 11155 LDB3 LIM domain binding 3 ENSG00000122367
This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 PYGM phosphorylase, glycogen, muscle ENSG00000068976
Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557 TCAP titin-cap ENSG00000173991
NA 6707 SPRR3 small proline rich protein 3 ENSG00000163209
This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. 4628 MYH10 myosin, heavy chain 10, non-muscle ENSG00000133026
The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD (myomesin 1) and 165 kD (myomesin 2). This protein, myomesin 1, like myomesin 2, titin, and other myofibrillar proteins contains structural modules with strong homology to either fibronectin type III (motif I) or immunoglobulin C2 (motif II) domains. Myomesin 1 and myomesin 2 each have a unique N-terminal region followed by 12 modules of motif I or motif II, in the arrangement II-II-I-I-I-I-I-II-II-II-II-II. The two proteins share 50% sequence identity in this repeat-containing region. The head structure formed by these 2 proteins on one end of the titin string extends into the center of the M band. The integrating structure of the sarcomere arises from muscle-specific members of the superfamily of immunoglobulin-like proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. 8736 MYOM1 myomesin 1 ENSG00000101605
Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 TG thyroglobulin ENSG00000042832
The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 TNNT2 troponin T2, cardiac type ENSG00000118194
This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. 1462 VCAN versican ENSG00000038427
NA 202333 CMYA5 cardiomyopathy associated 5 ENSG00000164309
This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. 5166 PDK4 pyruvate dehydrogenase kinase 4 ENSG00000004799
This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ABLIM1 actin binding LIM protein 1 ENSG00000099204
Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 MYL2 myosin light chain 2 ENSG00000111245
MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. 4634 MYL3 myosin light chain 3 ENSG00000160808
The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. 2170 FABP3 fatty acid binding protein 3 ENSG00000121769
NA ENSG00000242349 NPPA-AS1 NPPA antisense RNA 1 ENSG00000242349
The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. 26353 HSPB8 heat shock protein family B (small) member 8 ENSG00000152137
The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. 9659 PDE4DIP phosphodiesterase 4D interacting protein ENSG00000178104
This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. 7078 TIMP3 TIMP metallopeptidase inhibitor 3 ENSG00000100234
The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320 HSP90AA1 heat shock protein 90kDa alpha family class A member 1 ENSG00000080824
Mitochondrial creatine kinase (MtCK) is responsible for the transfer of high energy phosphate from mitochondria to the cytosolic carrier, creatine. It belongs to the creatine kinase isoenzyme family. It exists as two isoenzymes, sarcomeric MtCK and ubiquitous MtCK, encoded by separate genes. Mitochondrial creatine kinase occurs in two different oligomeric forms: dimers and octamers, in contrast to the exclusively dimeric cytosolic creatine kinase isoenzymes. Sarcomeric mitochondrial creatine kinase has 80% homology with the coding exons of ubiquitous mitochondrial creatine kinase. This gene contains sequences homologous to several motifs that are shared among some nuclear genes encoding mitochondrial proteins and thus may be essential for the coordinated activation of these genes during mitochondrial biogenesis. Three transcript variants encoding the same protein have been found for this gene. 1160 CKMT2 creatine kinase, mitochondrial 2 ENSG00000131730
The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 ACTA2 actin, alpha 2, smooth muscle, aorta ENSG00000107796
This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313 MMP2 matrix metallopeptidase 2 ENSG00000087245
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853 KRT6A keratin 6A ENSG00000205420
The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. 1441 CSF3R colony stimulating factor 3 receptor ENSG00000119535
Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 TNNC1 troponin C1, slow skeletal and cardiac type ENSG00000114854
The protein encoded by this gene is an isozyme of phosphoglucomutase (PGM) and belongs to the phosphohexose mutase family. There are several PGM isozymes, which are encoded by different genes and catalyze the transfer of phosphate between the 1 and 6 positions of glucose. In most cell types, this PGM isozyme is predominant, representing about 90% of total PGM activity. In red cells, PGM2 is a major isozyme. This gene is highly polymorphic. Mutations in this gene cause glycogen storage disease type 14. Alternativley spliced transcript variants encoding different isoforms have been identified in this gene. 5236 PGM1 phosphoglucomutase 1 ENSG00000079739
The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 PTGDS prostaglandin D2 synthase ENSG00000107317
The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 FASN fatty acid synthase ENSG00000169710
This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. 716 C1S complement component 1, s subcomponent ENSG00000182326
The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ANKRD1 ankyrin repeat domain 1 ENSG00000148677
Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. 8516 ITGA8 integrin subunit alpha 8 ENSG00000077943
This gene encodes the beta subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the beta subunit catalyzing the 3-ketoacyl-CoA thiolase activity. The encoded protein can also bind RNA and decreases the stability of some mRNAs. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation. Mutations in this gene result in trifunctional protein deficiency. Alternatively spliced transcript variants encoding different isoforms have been described. 3032 HADHB hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), beta subunit ENSG00000138029
NA 58476 TP53INP2 tumor protein p53 inducible nuclear protein 2 ENSG00000078804
This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. 7077 TIMP2 TIMP metallopeptidase inhibitor 2 ENSG00000035862
The protein encoded by this gene is a member of the thiol-specific antioxidant protein family. This protein is a bifunctional enzyme with two distinct active sites. It is involved in redox regulation of the cell; it can reduce H(2)O(2) and short chain organic, fatty acid, and phospholipid hydroperoxides. It may play a role in the regulation of phospholipid turnover as well as in protection against oxidative injury. 9588 PRDX6 peroxiredoxin 6 ENSG00000117592
NA ENSG00000229732 AC019349.5 NA ENSG00000229732
The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. 4052 LTBP1 latent transforming growth factor beta binding protein 1 ENSG00000049323
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039 HBA1 hemoglobin subunit alpha 1 ENSG00000206172
Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. 32 ACACB acetyl-CoA carboxylase beta ENSG00000076555
NA 6515 SLC2A3 solute carrier family 2 member 3 ENSG00000059804
MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607 MYBPC3 myosin binding protein C, cardiac ENSG00000134571
This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. 4703 NEB nebulin ENSG00000183091
This gene encodes a luminal sarcoplasmic reticulum protein identified by its ability to bind low-density lipoprotein with high affinity. The protein interacts with the cytoplasmic domain of triadin, the main transmembrane protein of the junctional sarcoplasmic reticulum (SR) of skeletal muscle. The protein functions in the regulation of releasable calcium into the SR. 3270 HRC histidine rich calcium binding protein ENSG00000130528
The scaffolding protein encoded by this gene is the main component of the caveolae plasma membranes found in most cell types. The protein links integrin subunits to the tyrosine kinase FYN, an initiating step in coupling integrins to the Ras-ERK pathway and promoting cell cycle progression. The gene is a tumor suppressor gene candidate and a negative regulator of the Ras-p42/44 mitogen-activated kinase cascade. Caveolin 1 and caveolin 2 are located next to each other on chromosome 7 and express colocalizing proteins that form a stable hetero-oligomeric complex. Mutations in this gene have been associated with Berardinelli-Seip congenital lipodystrophy. Alternatively spliced transcripts encode alpha and beta isoforms of caveolin 1. 857 CAV1 caveolin 1 ENSG00000105974
NA 4892 NRAP nebulin related anchoring protein ENSG00000197893
This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 COL6A3 collagen type VI alpha 3 chain ENSG00000163359
Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330 FBXL16 F-box and leucine rich repeat protein 16 ENSG00000127585
This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 CRNN cornulin ENSG00000143536
This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. 3304 HSPA1B heat shock protein family A (Hsp70) member 1B ENSG00000204388
This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. 3339 HSPG2 heparan sulfate proteoglycan 2 ENSG00000142798
This gene encodes an enzyme that oxidizes methionine residues on actin, thereby promoting depolymerization of actin filaments. This protein interacts with and regulates signalling by NEDD9/CAS-L (neural precursor cell expressed, developmentally down-regulated 9). Alternative splicing results in multiple transcript variants. 64780 MICAL1 microtubule associated monooxygenase, calponin and LIM domain containing 1 ENSG00000135596
This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 VIM vimentin ENSG00000026025
Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4620 MYH2 myosin, heavy chain 2, skeletal muscle, adult ENSG00000125414
N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. 4837 NNMT nicotinamide N-methyltransferase ENSG00000166741
This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. 4969 OGN osteoglycin ENSG00000106809
NA 54884 RETSAT retinol saturase ENSG00000042445
The protein encoded by this gene belongs to the class-3 semaphorin/collapsin family, whose members function in growth cone guidance during neuronal development. This family member inhibits axonal extension and has been shown to act as a tumor suppressor by inducing apoptosis. Alternative splicing of this gene results in multiple transcript variants. 7869 SEMA3B semaphorin 3B ENSG00000012171
The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 94274 PPP1R14A protein phosphatase 1 regulatory inhibitor subunit 14A ENSG00000167641
The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. 4053 LTBP2 latent transforming growth factor beta binding protein 2 ENSG00000119681
This gene encodes a protein that associates with the nuclear pore complex and participates in the regulation of nuclear transport. The encoded protein interacts with Ras-related nuclear protein 1 (RAN) and regulates guanosine triphosphate (GTP)-binding and exchange. Alternative splicing results in multiple transcript variants. 5905 RANGAP1 Ran GTPase activating protein 1 ENSG00000100401
The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). 1476 CSTB cystatin B ENSG00000160213
NA 55827 DCAF6 DDB1 and CUL4 associated factor 6 ENSG00000143164
This gene encodes an integral protein of the inner mitochondrial membrane. The enzyme couples hydride transfer between NAD(H) and NADP(+) to proton translocation across the inner mitochondrial membrane. Under most physiological conditions, the enzyme uses energy from the mitochondrial proton gradient to produce high concentrations of NADPH. The resulting NADPH is used for biosynthesis and in free radical detoxification. Two alternatively spliced variants, encoding the same protein, have been found for this gene. 23530 NNT nicotinamide nucleotide transhydrogenase ENSG00000112992
This gene encodes the integrin alpha X chain protein. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as inactivated-C3b (iC3b) receptor 4 (CR4). The alpha X beta 2 complex seems to overlap the properties of the alpha M beta 2 integrin in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Two transcript variants encoding different isoforms have been found for this gene. 3687 ITGAX integrin subunit alpha X ENSG00000140678
Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 C3 complement component 3 ENSG00000125730
This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. 1893 ECM1 extracellular matrix protein 1 ENSG00000143369
NA 105372824 LOC105372824 uncharacterized LOC105372824 ENSG00000160209
The protein encoded by this gene phosphorylates vitamin B6, a step required for the conversion of vitamin B6 to pyridoxal-5-phosphate, an important cofactor in intermediary metabolism. The encoded protein is cytoplasmic and probably acts as a homodimer. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. 8566 PDXK pyridoxal (pyridoxine, vitamin B6) kinase ENSG00000160209
This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. 4190 MDH1 malate dehydrogenase 1 ENSG00000014641
This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. 4627 MYH9 myosin, heavy chain 9, non-muscle ENSG00000100345
NA 64397 ZNF106 zinc finger protein 106 ENSG00000103994
NA 3310 HSPA6 heat shock protein family A (Hsp70) member 6 ENSG00000173110
The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. 2180 ACSL1 acyl-CoA synthetase long-chain family member 1 ENSG00000151726
The protein encoded by this gene acts as a guanine nucleotide exchange factor for the RHO family of small GTP-binding proteins (RACs). It has been shown to bind to and activate RAC1 by exchanging bound GDP for free GTP. The encoded protein, which is found mainly in the cytoplasm, is activated by phosphatidylinositol-3,4,5-trisphosphate and the beta-gamma subunits of heterotrimeric G proteins. 57580 PREX1 phosphatidylinositol-3,4,5-trisphosphate dependent Rac exchange factor 1 ENSG00000124126
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary query name X_id symbol notfound
The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. ENSG00000175206 natriuretic peptide A 4878 NPPA NA
Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. ENSG00000197616 myosin, heavy chain 6, cardiac muscle, alpha 4624 MYH6 NA
Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ENSG00000159251 actin, alpha, cardiac muscle 1 70 ACTC1 NA
NA ENSG00000242349 NPPA antisense RNA 1 ENSG00000242349 NPPA-AS1 NA
The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ENSG00000148677 ankyrin repeat domain 1 27063 ANKRD1 NA
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1 2335 FN1 NA
This gene encodes a multifunctional protein. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme includes two domains with distinct catalytic activities, a peptidylglycine alpha-hydroxylating monooxygenase (PHM) domain and a peptidyl-alpha-hydroxyglycine alpha-amidating lyase (PAL) domain. These catalytic domains work sequentially to catalyze the conversion of neuroendocrine peptides to active alpha-amidated products. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000145730 peptidylglycine alpha-amidating monooxygenase 5066 PAM NA
This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. ENSG00000198125 myoglobin 4151 MB NA
This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1 1277 COL1A1 NA
Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. ENSG00000092054 myosin, heavy chain 7, cardiac muscle, beta 4625 MYH7 NA
This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. ENSG00000050165 dickkopf WNT signaling pathway inhibitor 3 27122 DKK3 NA
NA ENSG00000106631 myosin light chain 7 58498 MYL7 NA
Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). ENSG00000171992 synaptopodin 11346 SYNPO NA
This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000174437 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 488 ATP2A2 NA
NA ENSG00000173641 heat shock protein family B (small) member 7 27129 HSPB7 NA
This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ENSG00000087086 ferritin, light polypeptide 2512 FTL NA
The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. ENSG00000121769 fatty acid binding protein 3 2170 FABP3 NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000077522 actinin alpha 2 88 ACTN2 NA
This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. ENSG00000163359 collagen type VI alpha 3 chain 1293 COL6A3 NA
The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. ENSG00000125868 destrin, actin depolymerizing factor 11034 DSTN NA
Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. ENSG00000173991 titin-cap 8557 TCAP NA
This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000106366 serpin family E member 1 5054 SERPINE1 NA
MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. ENSG00000134571 myosin binding protein C, cardiac 4607 MYBPC3 NA
This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. ENSG00000155657 titin 7273 TTN NA
This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2 1292 COL6A2 NA
This gene encodes a nonsarcomeric myosin regulatory light chain. This protein is activated by phosphorylation and regulates smooth muscle and non-muscle cell contraction. This protein may also be involved in DNA damage repair by sequestering the transcriptional regulator apoptosis-antagonizing transcription factor (AATF)/Che-1 which functions as a repressor of p53-driven apoptosis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 8. ENSG00000101608 myosin light chain 12A 10627 MYL12A NA
NA ENSG00000163661 pentraxin 3 5806 PTX3 NA
The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000107796 actin, alpha 2, smooth muscle, aorta 59 ACTA2 NA
This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. ENSG00000182809 cysteine rich protein 2 1397 CRIP2 NA
This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. ENSG00000163430 follistatin like 1 11167 FSTL1 NA
Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. ENSG00000114854 troponin C1, slow skeletal and cardiac type 7134 TNNC1 NA
This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000133026 myosin, heavy chain 10, non-muscle 4628 MYH10 NA
This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. ENSG00000115380 EGF containing fibulin like extracellular matrix protein 1 2202 EFEMP1 NA
The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1 7057 THBS1 NA
This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. ENSG00000166147 fibrillin 1 2200 FBN1 NA
The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). ENSG00000140092 fibulin 5 10516 FBLN5 NA
NA ENSG00000125148 metallothionein 2A 4502 MT2A NA
Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000101335 myosin light chain 9 10398 MYL9 NA
NA ENSG00000091986 coiled-coil domain containing 80 151887 CCDC80 NA
The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. ENSG00000149925 aldolase, fructose-bisphosphate A 226 ALDOA NA
This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. ENSG00000164733 cathepsin B 1508 CTSB NA
NA ENSG00000259716 NA NA NA TRUE
This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. ENSG00000196739 collagen type XXVII alpha 1 85301 COL27A1 NA
The protein encoded by this gene is a member of the formin-binding-protein family. The protein contains an N-terminal Fer/Cdc42-interacting protein 4 (CIP4) homology (FCH) domain followed by a coiled-coil domain, a proline-rich motif, a second coiled-coil domain, a Rho family protein-binding domain (RBD), and a C-terminal SH3 domain. This protein binds sorting nexin 2 (SNX2), tankyrase (TNKS), and dynamin; an interaction between this protein and formin has not been demonstrated yet in human. ENSG00000187239 formin binding protein 1 23048 FNBP1 NA
This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. ENSG00000140416 tropomyosin 1 (alpha) 7168 TPM1 NA
This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. ENSG00000120708 transforming growth factor beta induced 7045 TGFBI NA
The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. ENSG00000036448 myomesin 2 9172 MYOM2 NA
The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. ENSG00000135821 glutamate-ammonia ligase 2752 GLUL NA
FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. ENSG00000170323 fatty acid binding protein 4 2167 FABP4 NA
This gene encodes a member of the filamin family. The encoded protein interacts with glycoprotein Ib alpha as part of the process to repair vascular injuries. The platelet glycoprotein Ib complex includes glycoprotein Ib alpha, and it binds the actin cytoskeleton. Mutations in this gene have been found in several conditions: atelosteogenesis type 1 and type 3; boomerang dysplasia; autosomal dominant Larsen syndrome; and spondylocarpotarsal synostosis syndrome. Multiple alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. ENSG00000136068 filamin B 2317 FLNB NA
This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein’s biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. ENSG00000120937 natriuretic peptide B 4879 NPPB NA
NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 ENSG00000225630 MTND2P28 NA
The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ENSG00000143632 actin, alpha 1, skeletal muscle 58 ACTA1 NA
Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ENSG00000163017 actin, gamma 2, smooth muscle, enteric 72 ACTG2 NA
The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. ENSG00000175265 golgin A8 family member A 23015 GOLGA8A NA
This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin 1634 DCN NA
Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000198959 transglutaminase 2 7052 TGM2 NA
N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. ENSG00000166741 nicotinamide N-methyltransferase 4837 NNMT NA
The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. ENSG00000118729 calsequestrin 2 845 CASQ2 NA
The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. ENSG00000111341 matrix Gla protein 4256 MGP NA
This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. ENSG00000106624 AE binding protein 1 165 AEBP1 NA
This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. ENSG00000140545 milk fat globule-EGF factor 8 protein 4240 MFGE8 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000170477 keratin 4 3851 KRT4 NA
The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000049323 latent transforming growth factor beta binding protein 1 4052 LTBP1 NA
The protein encoded by this gene belongs to the complex I 9kDa subunit family. Mammalian complex I of mitochondrial respiratory chain is composed of 45 different subunits. This protein has NADH dehydrogenase activity and oxidoreductase activity. It transfers electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. ENSG00000189043 NDUFA4, mitochondrial complex associated 4697 NDUFA4 NA
This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. ENSG00000143248 regulator of G-protein signaling 5 8490 RGS5 NA
MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. ENSG00000160808 myosin light chain 3 4634 MYL3 NA
The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000178104 phosphodiesterase 4D interacting protein 9659 PDE4DIP NA
This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. ENSG00000123384 LDL receptor related protein 1 4035 LRP1 NA
This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. ENSG00000122786 caldesmon 1 800 CALD1 NA
This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. ENSG00000122367 LIM domain binding 3 11155 LDB3 NA
This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. ENSG00000119938 protein phosphatase 1 regulatory subunit 3C 5507 PPP1R3C NA
The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. ENSG00000118194 troponin T2, cardiac type 7139 TNNT2 NA
Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000198336 myosin light chain 4 4635 MYL4 NA
This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000038427 versican 1462 VCAN NA
This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. ENSG00000014641 malate dehydrogenase 1 4190 MDH1 NA
NA ENSG00000180139 ACTA2 antisense RNA 1 ENSG00000180139 ACTA2-AS1 NA
NA ENSG00000126803 heat shock protein family A (Hsp70) member 2 3306 HSPA2 NA
This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. ENSG00000146674 insulin like growth factor binding protein 3 3486 IGFBP3 NA
This gene encodes a protein that is a member of the dickkopf family. It is a secreted protein with two cysteine rich regions and is involved in embryonic development through its inhibition of the WNT signaling pathway. Elevated levels of DKK1 in bone marrow plasma and peripheral blood is associated with the presence of osteolytic bone lesions in patients with multiple myeloma. ENSG00000107984 dickkopf WNT signaling pathway inhibitor 1 22943 DKK1 NA
Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ENSG00000100316 ribosomal protein L3 6122 RPL3 NA
This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. ENSG00000011028 mannose receptor C type 2 9902 MRC2 NA
Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. ENSG00000077942 fibulin 1 2192 FBLN1 NA
The protein encoded by this gene is a cytosolic protein which contains a phosphotyrosine binding (PTD) domain. The PTD domain has been found to interact with the cytoplasmic tail of the LDL receptor. Mutations in this gene lead to LDL receptor malfunction and cause the disorder autosomal recessive hypercholesterolaemia. ENSG00000157978 low density lipoprotein receptor adaptor protein 1 26119 LDLRAP1 NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ENSG00000137154 ribosomal protein S6 6194 RPS6 NA
This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000167588 glycerol-3-phosphate dehydrogenase 1 2819 GPD1 NA
The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. ENSG00000136999 nephroblastoma overexpressed 4856 NOV NA
The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. ENSG00000107317 prostaglandin D2 synthase 5730 PTGDS NA
This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. ENSG00000204262 collagen type V alpha 2 chain 1290 COL5A2 NA
The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. ENSG00000118523 connective tissue growth factor 1490 CTGF NA
NA ENSG00000163486 NA NA NA TRUE
The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000136235 glycoprotein nmb 10457 GPNMB NA
This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. ENSG00000041982 tenascin C 3371 TNC NA
This gene encodes an enzyme which catalyzes the first step in the hydrolysis of triglycerides in adipose tissue. Mutations in this gene are associated with neutral lipid storage disease with myopathy. ENSG00000177666 patatin like phospholipase domain containing 2 57104 PNPLA2 NA
The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. ENSG00000166819 perilipin 1 5346 PLIN1 NA
The protein encoded by this gene catalyzes the transport of phosphate into the mitochondrial matrix, either by proton cotransport or in exchange for hydroxyl ions. The protein contains three related segments arranged in tandem which are related to those found in other characterized members of the mitochondrial carrier family. Both the N-terminal and C-terminal regions of this protein protrude toward the cytosol. Multiple alternatively spliced transcript variants have been isolated. ENSG00000075415 solute carrier family 25 member 3 5250 SLC25A3 NA
This gene encodes cytochrome b5 reductase, which includes a membrane-bound form in somatic cells (anchored in the endoplasmic reticulum, mitochondrial and other membranes) and a soluble form in erythrocytes. The membrane-bound form exists mainly on the cytoplasmic side of the endoplasmic reticulum and functions in desaturation and elongation of fatty acids, in cholesterol biosynthesis, and in drug metabolism. The erythrocyte form is located in a soluble fraction of circulating erythrocytes and is involved in methemoglobin reduction. The membrane-bound form has both membrane-binding and catalytic domains, while the soluble form has only the catalytic domain. Alternate splicing results in multiple transcript variants. Mutations in this gene cause methemoglobinemias. ENSG00000100243 cytochrome b5 reductase 3 1727 CYB5R3 NA
This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. ENSG00000170558 cadherin 2 1000 CDH2 NA
This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. ENSG00000129116 palladin, cytoskeletal associated protein 23022 PALLD NA
NA ENSG00000156299 T-cell lymphoma invasion and metastasis 1 7074 TIAM1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol notfound
protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 ENSG00000204983 PRSS1 NA
carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 ENSG00000091704 CPA1 NA
pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 ENSG00000175535 PNLIP NA
chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 ENSG00000142789 CELA3A NA
glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 ENSG00000169347 GP2 NA
myelin basic protein The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 ENSG00000197971 MBP NA
carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 ENSG00000153002 CPB1 NA
colipase The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208 ENSG00000137392 CLPS NA
chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. 23436 ENSG00000219073 CELA3B NA
chymotrypsinogen B2 NA 440387 ENSG00000168928 CTRB2 NA
chymotrypsinogen B1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. 1504 ENSG00000168925 CTRB1 NA
carboxyl ester lipase The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. 1056 ENSG00000170835 CEL NA
amylase, alpha 2B (pancreatic) Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, and thus catalyze the first step in digestion of dietary starch and glycogen. The human genome has a cluster of several amylase genes that are expressed at high levels in either salivary gland or pancreas. This gene encodes an amylase isoenzyme produced by the pancreas. 280 ENSG00000240038 AMY2B NA
hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 ENSG00000244734 HBB NA
regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 ENSG00000115386 REG1A NA
carboxypeptidase A2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. 1358 ENSG00000158516 CPA2 NA
keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 ENSG00000171401 KRT13 NA
chymotrypsin like elastase family member 2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 63036 ENSG00000142615 CELA2A NA
amylase, alpha 2A (pancreatic) This gene encodes a member of the alpha-amylase family of proteins. Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the first step in digestion of dietary starch and glycogen. This gene and several family members are present in a gene cluster on chromosome 1. This gene encodes an amylase isoenzyme produced by the pancreas. 279 ENSG00000243480 AMY2A NA
chymotrypsin C This gene encodes a member of the peptidase S1 family. The encoded protein is a serum calcium-decreasing factor that has chymotrypsin-like protease activity. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. 11330 ENSG00000162438 CTRC NA
phospholipase A2 group IB This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. 5319 ENSG00000170890 PLA2G1B NA
pancreatic lipase related protein 1 NA 5407 ENSG00000187021 PNLIPRP1 NA
AHNAK nucleoprotein NA 79026 ENSG00000124942 AHNAK NA
insulin like growth factor binding protein 5 NA 3488 ENSG00000115461 IGFBP5 NA
NA NA ENSG00000266844 ENSG00000266844 RP11-862L9.3 NA
fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 ENSG00000115414 FN1 NA
keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 ENSG00000170477 KRT4 NA
NA NA NA ENSG00000250606 NA TRUE
hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040 ENSG00000188536 HBA2 NA
tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 ENSG00000198467 TPM2 NA
NA NA NA ENSG00000165862 NA TRUE
regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5968 ENSG00000172023 REG1B NA
NA NA ENSG00000240338 ENSG00000240338 RP11-331F4.4 NA
gelsolin The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. 2934 ENSG00000148180 GSN NA
synaptotagmin like 1 NA 84958 ENSG00000142765 SYTL1 NA
SEL1L ERAD E3 ligase adaptor subunit The protein encoded by this gene is part of a protein complex required for the retrotranslocation or dislocation of misfolded proteins from the endoplasmic reticulum lumen to the cytosol, where they are degraded by the proteasome in a ubiquitin-dependent manner. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 6400 ENSG00000071537 SEL1L NA
small proline rich protein 3 NA 6707 ENSG00000163209 SPRR3 NA
prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 ENSG00000107317 PTGDS NA
heat shock protein 90kDa alpha family class A member 1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320 ENSG00000080824 HSP90AA1 NA
collagen type VI alpha 3 chain This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 ENSG00000163359 COL6A3 NA
maturin, neural progenitor differentiation regulator homolog (Xenopus) NA 222166 ENSG00000180354 MTURN NA
NMDA receptor synaptonuclear signaling and neuronal migration factor The protein encoded by this gene is involved in guidance of olfactory axon projections and migration of luteinizing hormone-releasing hormone neurons. Defects in this gene are a cause of idiopathic hypogonadotropic hypogonadism (IHH). Several transcript variants encoding different isoforms have been found for this gene. 26012 ENSG00000165802 NSMF NA
clusterin The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. 1191 ENSG00000120885 CLU NA
metallothionein 2A NA 4502 ENSG00000125148 MT2A NA
syncollin NA 342898 ENSG00000179751 SYCN NA
NA NA ENSG00000229732 ENSG00000229732 AC019349.5 NA
ubiquitin B This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. 7314 ENSG00000170315 UBB NA
coiled-coil domain containing 136 NA 64753 ENSG00000128596 CCDC136 NA
plasmalemma vesicle associated protein NA 83483 ENSG00000130300 PLVAP NA
transferrin This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. 7018 ENSG00000091513 TF NA
phospholipase C eta 2 PLCH2 is a member of the PLC-eta family of the phosphoinositide-specific phospholipase C (PLC) superfamily of enzymes that cleave PtdIns(4,5) P2 to generate second messengers inositol 1,4,5-trisphosphate and diacylglycerol (Zhou et al., 2005 [PubMed 16107206]). 9651 ENSG00000149527 PLCH2 NA
hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039 ENSG00000206172 HBA1 NA
integral membrane protein 2C NA 81618 ENSG00000135916 ITM2C NA
ribosomal protein S3 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 6188 ENSG00000149273 RPS3 NA
calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805 ENSG00000143933 CALM2 NA
cornulin This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 ENSG00000143536 CRNN NA
microtubule associated serine/threonine kinase 3 NA 23031 ENSG00000099308 MAST3 NA
major histocompatibility complex, class I, B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. 3106 ENSG00000234745 HLA-B NA
keratin 6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853 ENSG00000205420 KRT6A NA
transgelin The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 ENSG00000149591 TAGLN NA
fibroblast growth factor receptor 3 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. 2261 ENSG00000068078 FGFR3 NA
chromogranin A The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. 1113 ENSG00000100604 CHGA NA
stathmin 1 This gene belongs to the stathmin family of genes. It encodes a ubiquitous cytosolic phosphoprotein proposed to function as an intracellular relay integrating regulatory signals of the cellular environment. The encoded protein is involved in the regulation of the microtubule filament system by destabilizing microtubules. It prevents assembly and promotes disassembly of microtubules. Multiple transcript variants encoding different isoforms have been found for this gene. 3925 ENSG00000117632 STMN1 NA
alcohol dehydrogenase 1B (class I), beta polypeptide The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. 125 ENSG00000196616 ADH1B NA
metallothionein 3 NA 4504 ENSG00000087250 MT3 NA
cerebral endothelial cell adhesion molecule NA 51148 ENSG00000167123 CERCAM NA
nuclear paraspeckle assembly transcript 1 (non-protein coding) This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. 283131 ENSG00000245532 NEAT1 NA
MT-CO1 pseudogene 12 NA ENSG00000237973 ENSG00000237973 MTCO1P12 NA
heparan sulfate proteoglycan 2 This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. 3339 ENSG00000142798 HSPG2 NA
apolipoprotein D This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. 347 ENSG00000189058 APOD NA
outer dense fiber of sperm tails 2 The outer dense fibers are cytoskeletal structures that surround the axoneme in the middle piece and principal piece of the sperm tail. The fibers function in maintaining the elastic structure and recoil of the sperm tail as well as in protecting the tail from shear forces during epididymal transport and ejaculation. Defects in the outer dense fibers lead to abnormal sperm morphology and infertility. This gene encodes one of the major outer dense fiber proteins. Alternative splicing results in multiple transcript variants. The longer transcripts, also known as ‘Cenexins’, encode proteins with a C-terminal extension that are differentially targeted to somatic centrioles and thought to be crucial for the formation of microtubule organizing centers. 4957 ENSG00000136811 ODF2 NA
NA NA NA ENSG00000259716 NA TRUE
keratin 5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3852 ENSG00000186081 KRT5 NA
perilipin 2 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. 123 ENSG00000147872 PLIN2 NA
Rh family C glycoprotein NA 51458 ENSG00000140519 RHCG NA
energy homeostasis associated NA 375704 ENSG00000168913 ENHO NA
ribosomal protein L13a Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L13P family of ribosomal proteins that is a component of the 60S subunit. The encoded protein also plays a role in the repression of inflammatory genes as a component of the IFN-gamma-activated inhibitor of translation (GAIT) complex. This gene is co-transcribed with the small nucleolar RNA genes U32, U33, U34, and U35, which are located in the second, fourth, fifth, and sixth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 23521 ENSG00000142541 RPL13A NA
fatty acid synthase The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 ENSG00000169710 FASN NA
actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ENSG00000143632 ACTA1 NA
ribosomal protein S6 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6194 ENSG00000137154 RPS6 NA
periplakin The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. 5493 ENSG00000118898 PPL NA
chymotrypsin like elastase family member 2B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2B is secreted from the pancreas as a zymogen. In other species, elastase 2B has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 51032 ENSG00000215704 CELA2B NA
G protein subunit gamma 7 NA 2788 ENSG00000176533 GNG7 NA
F-box and leucine rich repeat protein 16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330 ENSG00000127585 FBXL16 NA
serine peptidase inhibitor, Kunitz type, 2 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. 10653 ENSG00000167642 SPINT2 NA
junction plakoglobin This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. 3728 ENSG00000173801 JUP NA
uncharacterized LOC105372824 NA 105372824 ENSG00000160209 LOC105372824 NA
pyridoxal (pyridoxine, vitamin B6) kinase The protein encoded by this gene phosphorylates vitamin B6, a step required for the conversion of vitamin B6 to pyridoxal-5-phosphate, an important cofactor in intermediary metabolism. The encoded protein is cytoplasmic and probably acts as a homodimer. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. 8566 ENSG00000160209 PDXK NA
neurogranin Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. 4900 ENSG00000154146 NRGN NA
family with sequence similarity 107 member A NA 11170 ENSG00000168309 FAM107A NA
versican This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. 1462 ENSG00000038427 VCAN NA
creatine kinase B The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. 1152 ENSG00000166165 CKB NA
destrin, actin depolymerizing factor The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. 11034 ENSG00000125868 DSTN NA
plexin B1 NA 5364 ENSG00000164050 PLXNB1 NA
aldolase, fructose-bisphosphate A The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. 226 ENSG00000149925 ALDOA NA
follistatin like 1 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. 11167 ENSG00000163430 FSTL1 NA
microsomal glutathione S-transferase 1 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. 4257 ENSG00000008394 MGST1 NA
glyceraldehyde-3-phosphate dehydrogenase This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. 2597 ENSG00000111640 GAPDH NA
myelin regulatory factor This gene encodes a transcription factor that is required for central nervous system myelination and may regulate oligodendrocyte differentiation. It is thought to act by increasing the expression of genes that effect myelin production but may also directly promote myelin gene expression. Loss of a similar gene in mouse models results in severe demyelination. Alternative splicing results in multiple transcript variants. 745 ENSG00000124920 MYRF NA
synaptopodin 2 NA 171024 ENSG00000172403 SYNPO2 NA
sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. 9806 ENSG00000107742 SPOCK2 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
TG 7038 ENSG00000042832 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
TPO 7173 ENSG00000115705 thyroid peroxidase This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. NA
ALB 213 ENSG00000163631 albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. NA
PAX8 7849 ENSG00000125618 paired box 8 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. NA
HP 3240 ENSG00000257017 haptoglobin This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. NA
FGA 2243 ENSG00000171560 fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. NA
FGB 2244 ENSG00000171564 fibrinogen beta chain The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
C3 718 ENSG00000125730 complement component 3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. NA
ORM1 5004 ENSG00000229314 orosomucoid 1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. NA
PTGDS 5730 ENSG00000107317 prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. NA
FGG 2266 ENSG00000171557 fibrinogen gamma chain The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. NA
MBP 4155 ENSG00000197971 myelin basic protein The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. NA
CRP 1401 ENSG00000132693 C-reactive protein, pentraxin-related The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. NA
NA NA ENSG00000090920 NA NA TRUE
ALDH2 217 ENSG00000111275 aldehyde dehydrogenase 2 family (mitochondrial) This protein belongs to the aldehyde dehydrogenase family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. Two major liver isoforms of aldehyde dehydrogenase, cytosolic and mitochondrial, can be distinguished by their electrophoretic mobilities, kinetic properties, and subcellular localizations. Most Caucasians have two major isozymes, while approximately 50% of Orientals have the cytosolic isozyme but not the mitochondrial isozyme. A remarkably higher frequency of acute alcohol intoxication among Orientals than among Caucasians could be related to the absence of a catalytically active form of the mitochondrial isozyme. The increased exposure to acetaldehyde in individuals with the catalytically inactive form may also confer greater susceptibility to many types of cancer. This gene encodes a mitochondrial isoform, which has a low Km for acetaldehydes, and is localized in mitochondrial matrix. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
CTSB 1508 ENSG00000164733 cathepsin B This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. NA
TFF3 7033 ENSG00000160180 trefoil factor 3 Members of the trefoil family are characterized by having at least one copy of the trefoil motif, a 40-amino acid domain that contains three conserved disulfides. They are stable secretory proteins expressed in gastrointestinal mucosa. Their functions are not defined, but they may protect the mucosa from insults, stabilize the mucus layer and affect healing of the epithelium. This gene is expressed in goblet cells of the intestines and colon. This gene and two other related trefoil family member genes are found in a cluster on chromosome 21. NA
ANXA1 301 ENSG00000135046 annexin A1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. NA
EMP1 2012 ENSG00000134531 epithelial membrane protein 1 NA NA
GAPDH 2597 ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. NA
APOC3 345 ENSG00000110245 apolipoprotein C3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. NA
H19 283120 ENSG00000130600 H19, imprinted maternally expressed transcript (non-protein coding) This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. NA
ACSL1 2180 ENSG00000151726 acyl-CoA synthetase long-chain family member 1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. NA
CTSD 1509 ENSG00000117984 cathepsin D This gene encodes a member of the A1 family of peptidases. The encoded preproprotein is proteolytically processed to generate multiple protein products. These products include the cathepsin D light and heavy chains, which heterodimerize to form the mature enzyme. This enzyme exhibits pepsin-like activity and plays a role in protein turnover and in the proteolytic activation of hormones and growth factors. Mutations in this gene play a causal role in neuronal ceroid lipofuscinosis-10 and may be involved in the pathogenesis of several other diseases, including breast cancer and possibly Alzheimer’s disease. NA
PRM2 5620 ENSG00000122304 protamine 2 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. NA
APOH 350 ENSG00000091583 apolipoprotein H Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. NA
INPP5J 27124 ENSG00000185133 inositol polyphosphate-5-phosphatase J NA NA
ATP1A2 477 ENSG00000018625 ATPase Na+/K+ transporting subunit alpha 2 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. NA
UBB 7314 ENSG00000170315 ubiquitin B This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. NA
NPNT 255743 ENSG00000168743 nephronectin NA NA
VIM 7431 ENSG00000026025 vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. NA
LIPG 9388 ENSG00000101670 lipase G, endothelial type The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. NA
DES 1674 ENSG00000175084 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. NA
TTN 7273 ENSG00000155657 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. NA
CYP27A1 1593 ENSG00000135929 cytochrome P450 family 27 subfamily A member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This mitochondrial protein oxidizes cholesterol intermediates as part of the bile synthesis pathway. Since the conversion of cholesterol to bile acids is the major route for removing cholesterol from the body, this protein is important for overall cholesterol homeostasis. Mutations in this gene cause cerebrotendinous xanthomatosis, a rare autosomal recessive lipid storage disease. NA
AMBP 259 ENSG00000106927 alpha-1-microglobulin/bikunin precursor This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. NA
ATP2A2 488 ENSG00000174437 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ST6GAL1 6480 ENSG00000073849 ST6 beta-galactoside alpha-2,6-sialyltransferase 1 This gene encodes a member of glycosyltransferase family 29. The encoded protein is a type II membrane protein that catalyzes the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. The protein, which is normally found in the Golgi but can be proteolytically processed to a soluble form, is involved in the generation of the cell-surface carbohydrate determinants and differentiation antigens HB-6, CD75, and CD76. This gene has been incorrectly referred to as CD75. Three transcript variants encoding two different isoforms have been described. NA
FN1 2335 ENSG00000115414 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. NA
PLIN2 123 ENSG00000147872 perilipin 2 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. NA
APOA2 336 ENSG00000158874 apolipoprotein A2 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. NA
ASS1 445 ENSG00000130707 argininosuccinate synthase 1 The protein encoded by this gene catalyzes the penultimate step of the arginine biosynthetic pathway. There are approximately 10 to 14 copies of this gene including the pseudogenes scattered across the human genome, among which the one located on chromosome 9 appears to be the only functional gene for argininosuccinate synthetase. Mutations in the chromosome 9 copy of this gene cause citrullinemia. Two transcript variants encoding the same protein have been found for this gene. NA
CRAT 1384 ENSG00000095321 carnitine O-acetyltransferase This gene encodes carnitine acetyltransferase (CRAT), which is a key enzyme in the metabolic pathway in mitochondria, peroxisomes and endoplasmic reticulum. CRAT catalyzes the reversible transfer of acyl groups from an acyl-CoA thioester to carnitine and regulates the ratio of acylCoA/CoA in the subcellular compartments. Two transcript variants encoding different isoforms have been found for this gene. NA
KRT7 3855 ENSG00000135480 keratin 7 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. NA
FBXW5 54461 ENSG00000159069 F-box and WD repeat domain containing 5 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene contains WD-40 domains, in addition to an F-box motif, so it belongs to the Fbw class. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene, however, they were found to be nonsense-mediated mRNA decay (NMD) candidates, hence not represented. NA
PRM1 5619 ENSG00000175646 protamine 1 NA NA
LDB3 11155 ENSG00000122367 LIM domain binding 3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. NA
YBX3 8531 ENSG00000060138 Y-box binding protein 3 NA NA
APOE 348 ENSG00000130203 apolipoprotein E The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. NA
AFAP1L2 84632 ENSG00000169129 actin filament associated protein 1 like 2 NA NA
CHI3L1 1116 ENSG00000133048 chitinase 3 like 1 Chitinases catalyze the hydrolysis of chitin, which is an abundant glycopolymer found in insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 family of chitinases includes eight human family members. This gene encodes a glycoprotein member of the glycosyl hydrolase 18 family. The protein lacks chitinase activity and is secreted by activated macrophages, chondrocytes, neutrophils and synovial cells. The protein is thought to play a role in the process of inflammation and tissue remodeling. NA
EEF1A2 1917 ENSG00000101210 eukaryotic translation elongation factor 1 alpha 2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. NA
MTCO1P12 ENSG00000237973 ENSG00000237973 MT-CO1 pseudogene 12 NA NA
FLNC 2318 ENSG00000128591 filamin C This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. NA
PINK1 65018 ENSG00000158828 PTEN induced putative kinase 1 This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. NA
HSP90B1 7184 ENSG00000166598 heat shock protein 90kDa beta family member 1 This gene encodes a member of a family of adenosine triphosphate(ATP)-metabolizing molecular chaperones with roles in stabilizing and folding other proteins. The encoded protein is localized to melanosomes and the endoplasmic reticulum. Expression of this protein is associated with a variety of pathogenic states, including tumor formation. There is a microRNA gene located within the 5’ exon of this gene. There are pseudogenes for this gene on chromosomes 1 and 15. NA
HSP90AA1 3320 ENSG00000080824 heat shock protein 90kDa alpha family class A member 1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. NA
TFCP2L1 29842 ENSG00000115112 transcription factor CP2-like 1 NA NA
APOD 347 ENSG00000189058 apolipoprotein D This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. NA
GOLGA8A 23015 ENSG00000175265 golgin A8 family member A The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. NA
RHOB 388 ENSG00000143878 ras homolog family member B NA NA
RGL3 57139 ENSG00000205517 ral guanine nucleotide dissociation stimulator like 3 NA NA
CKM 1158 ENSG00000104879 creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. NA
CYB5A 1528 ENSG00000166347 cytochrome b5 type A The protein encoded by this gene is a membrane-bound cytochrome that reduces ferric hemoglobin (methemoglobin) to ferrous hemoglobin, which is required for stearyl-CoA-desaturase activity. Defects in this gene are a cause of type IV hereditary methemoglobinemia. Three transcript variants encoding different isoforms have been found for this gene. NA
HSPB7 27129 ENSG00000173641 heat shock protein family B (small) member 7 NA NA
SLC25A4 291 ENSG00000151729 solute carrier family 25 member 4 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. NA
DCXR 51181 ENSG00000169738 dicarbonyl/L-xylulose reductase The protein encoded by this gene acts as a homotetramer to catalyze diacetyl reductase and L-xylulose reductase reactions. The encoded protein may play a role in the uronate cycle of glucose metabolism and in the cellular osmoregulation in the proximal renal tubules. Defects in this gene are a cause of pentosuria. Two transcript variants encoding different isoforms have been found for this gene. NA
SLC4A11 83959 ENSG00000088836 solute carrier family 4 member 11 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. NA
RP11-862L9.3 ENSG00000266844 ENSG00000266844 NA NA NA
PLEKHH1 57475 ENSG00000054690 pleckstrin homology, MyTH4 and FERM domain containing H1 NA NA
RP11-138I1.4 ENSG00000265401 ENSG00000265401 NA NA NA
GMPR 2766 ENSG00000137198 guanosine monophosphate reductase This gene encodes an enzyme that catalyzes the irreversible and NADPH-dependent reductive deamination of GMP to IMP. The protein also functions in the re-utilization of free intracellular bases and purine nucleosides. NA
NPPA 4878 ENSG00000175206 natriuretic peptide A The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NA
PHF7 51533 ENSG00000010318 PHD finger protein 7 Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. NA
RP5-940J5.9 ENSG00000269968 ENSG00000269968 NA NA NA
ANXA2 302 ENSG00000182718 annexin A2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
TPM2 7169 ENSG00000198467 tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
MYOM1 8736 ENSG00000101605 myomesin 1 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD (myomesin 1) and 165 kD (myomesin 2). This protein, myomesin 1, like myomesin 2, titin, and other myofibrillar proteins contains structural modules with strong homology to either fibronectin type III (motif I) or immunoglobulin C2 (motif II) domains. Myomesin 1 and myomesin 2 each have a unique N-terminal region followed by 12 modules of motif I or motif II, in the arrangement II-II-I-I-I-I-I-II-II-II-II-II. The two proteins share 50% sequence identity in this repeat-containing region. The head structure formed by these 2 proteins on one end of the titin string extends into the center of the M band. The integrating structure of the sarcomere arises from muscle-specific members of the superfamily of immunoglobulin-like proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
ADSSL1 122622 ENSG00000185100 adenylosuccinate synthase like 1 This gene encodes a member of the adenylosuccinate synthase family of proteins. The encoded muscle-specific enzyme plays a role in the purine nucleotide cycle by catalyzing the first step in the conversion of inosine monophosphate (IMP) to adenosine monophosphate (AMP). Mutations in this gene may cause adolescent onset distal myopathy. Alternative splicing results in multiple transcript variants. NA
REEP6 92840 ENSG00000115255 receptor accessory protein 6 NA NA
TPM3 7170 ENSG00000143549 tropomyosin 3 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. NA
ECHS1 1892 ENSG00000127884 enoyl-CoA hydratase, short chain, 1, mitochondrial The protein encoded by this gene functions in the second step of the mitochondrial fatty acid beta-oxidation pathway. It catalyzes the hydration of 2-trans-enoyl-coenzyme A (CoA) intermediates to L-3-hydroxyacyl-CoAs. The gene product is a member of the hydratase/isomerase superfamily. It localizes to the mitochondrial matrix. Transcript variants utilizing alternative transcription initiation sites have been described in the literature. NA
SYNPO 11346 ENSG00000171992 synaptopodin Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). NA
RNASE1 6035 ENSG00000129538 ribonuclease A family member 1, pancreatic This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. NA
HLA-B 3106 ENSG00000234745 major histocompatibility complex, class I, B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. NA
OPTN 10133 ENSG00000123240 optineurin This gene encodes the coiled-coil containing protein optineurin. Optineurin may play a role in normal-tension glaucoma and adult-onset primary open angle glaucoma. Optineurin interacts with adenovirus E3-14.7K protein and may utilize tumor necrosis factor-alpha or Fas-ligand pathways to mediate apoptosis, inflammation or vasoconstriction. Optineurin may also function in cellular morphogenesis and membrane trafficking, vesicle trafficking, and transcription activation through its interactions with the RAB8, huntingtin, and transcription factor IIIA proteins. Alternative splicing results in multiple transcript variants encoding the same protein. NA
RARRES2 5919 ENSG00000106538 retinoic acid receptor responder 2 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. NA
DCAF6 55827 ENSG00000143164 DDB1 and CUL4 associated factor 6 NA NA
IDH2 3418 ENSG00000182054 isocitrate dehydrogenase (NADP(+)) 2, mitochondrial Isocitrate dehydrogenases catalyze the oxidative decarboxylation of isocitrate to 2-oxoglutarate. These enzymes belong to two distinct subclasses, one of which utilizes NAD(+) as the electron acceptor and the other NADP(+). Five isocitrate dehydrogenases have been reported: three NAD(+)-dependent isocitrate dehydrogenases, which localize to the mitochondrial matrix, and two NADP(+)-dependent isocitrate dehydrogenases, one of which is mitochondrial and the other predominantly cytosolic. Each NADP(+)-dependent isozyme is a homodimer. The protein encoded by this gene is the NADP(+)-dependent isocitrate dehydrogenase found in the mitochondria. It plays a role in intermediary metabolism and energy production. This protein may tightly associate or interact with the pyruvate dehydrogenase complex. Alternative splicing results in multiple transcript variants. NA
AC017116.11 ENSG00000239775 ENSG00000239775 NA NA NA
CYP3A5 1577 ENSG00000106258 cytochrome P450 family 3 subfamily A member 5 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. NA
GLUD1 2746 ENSG00000148672 glutamate dehydrogenase 1 This gene encodes glutamate dehydrogenase, which is a mitochondrial matrix enzyme that catalyzes the oxidative deamination of glutamate to alpha-ketoglutarate and ammonia. This enzyme has an important role in regulating amino acid-induced insulin secretion. It is allosterically activated by ADP and inhibited by GTP and ATP. Activating mutations in this gene are a common cause of congenital hyperinsulinism. Alternative splicing of this gene results in multiple transcript variants. The related glutamate dehydrogenase 2 gene on the human X-chromosome originated from this gene via retrotransposition and encodes a soluble form of glutamate dehydrogenase. Related pseudogenes have been identified on chromosomes 10, 18 and X. NA
MYBPC1 4604 ENSG00000196091 myosin binding protein C, slow type This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
SCARB1 949 ENSG00000073060 scavenger receptor class B member 1 The protein encoded by this gene is a plasma membrane receptor for high density lipoprotein cholesterol (HDL). The encoded protein mediates cholesterol transfer to and from HDL. In addition, this protein is a receptor for hepatitis C virus glycoprotein E2. Two transcript variants encoding different isoforms have been found for this gene. NA
SCP2 6342 ENSG00000116171 sterol carrier protein 2 This gene encodes two proteins: sterol carrier protein X (SCPx) and sterol carrier protein 2 (SCP2), as a result of transcription initiation from 2 independently regulated promoters. The transcript initiated from the proximal promoter encodes the longer SCPx protein, and the transcript initiated from the distal promoter encodes the shorter SCP2 protein, with the 2 proteins sharing a common C-terminus. Evidence suggests that the SCPx protein is a peroxisome-associated thiolase that is involved in the oxidation of branched chain fatty acids, while the SCP2 protein is thought to be an intracellular lipid transfer protein. This gene is highly expressed in organs involved in lipid metabolism, and may play a role in Zellweger syndrome, in which cells are deficient in peroxisomes and have impaired bile acid synthesis. Alternative splicing of this gene produces multiple transcript variants, some encoding different isoforms. NA
A2M 2 ENSG00000175899 alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. NA
MAST2 23139 ENSG00000086015 microtubule associated serine/threonine kinase 2 NA NA
ALDOA 226 ENSG00000149925 aldolase, fructose-bisphosphate A The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. NA
PPP1R3C 5507 ENSG00000119938 protein phosphatase 1 regulatory subunit 3C This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. NA
GPT 2875 ENSG00000167701 glutamic-pyruvate transaminase (alanine aminotransferase) This gene encodes cytosolic alanine aminotransaminase 1 (ALT1); also known as glutamate-pyruvate transaminase 1. This enzyme catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate and, therefore, plays a key role in the intermediary metabolism of glucose and amino acids. Serum activity levels of this enzyme are routinely used as a biomarker of liver injury caused by drug toxicity, infection, alcohol, and steatosis. A related gene on chromosome 16 encodes a putative mitochondrial alanine aminotransaminase. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query notfound
7145 tensin 1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. TNS1 ENSG00000079308 NA
8404 SPARC like 1 NA SPARCL1 ENSG00000152583 NA
4240 milk fat globule-EGF factor 8 protein This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. MFGE8 ENSG00000140545 NA
7431 vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM ENSG00000026025 NA
8490 regulator of G-protein signaling 5 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. RGS5 ENSG00000143248 NA
5549 proline and arginine rich end leucine rich repeat protein The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. PRELP ENSG00000188783 NA
7078 TIMP metallopeptidase inhibitor 3 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. TIMP3 ENSG00000100234 NA
4256 matrix Gla protein The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. MGP ENSG00000111341 NA
3490 insulin like growth factor binding protein 7 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). IGFBP7 ENSG00000163453 NA
7052 transglutaminase 2 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. TGM2 ENSG00000198959 NA
165 AE binding protein 1 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AEBP1 ENSG00000106624 NA
51747 LUC7 like 3 pre-mRNA splicing factor This gene encodes a protein with an N-terminal half that contains cysteine/histidine motifs and leucine zipper-like repeats, and the C-terminal half is rich in arginine and glutamate residues (RE domain) and arginine and serine residues (RS domain). This protein localizes with a speckled pattern in the nucleus, and could be involved in the formation of splicesome via the RE and RS domains. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. LUC7L3 ENSG00000108848 NA
23524 serine/arginine repetitive matrix 2 NA SRRM2 ENSG00000167978 NA
25802 leiomodin 1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. LMOD1 ENSG00000163431 NA
116983 ArfGAP with coiled-coil, ankyrin repeat and PH domains 3 NA ACAP3 ENSG00000131584 NA
2034 endothelial PAS domain protein 1 This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. EPAS1 ENSG00000116016 NA
6625 small nuclear ribonucleoprotein U1 subunit 70 NA SNRNP70 ENSG00000104852 NA
11034 destrin, actin depolymerizing factor The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. DSTN ENSG00000125868 NA
4026 LIM domain containing preferred translocation partner in lipoma This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. LPP ENSG00000145012 NA
2 alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. A2M ENSG00000175899 NA
3983 actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. ABLIM1 ENSG00000099204 NA
5310 polycystin 1, transient receptor potential channel interacting This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. PKD1 ENSG00000008710 NA
1282 collagen type IV alpha 1 chain This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. COL4A1 ENSG00000187498 NA
388 ras homolog family member B NA RHOB ENSG00000143878 NA
7094 talin 1 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. TLN1 ENSG00000137076 NA
85301 collagen type XXVII alpha 1 This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. COL27A1 ENSG00000196739 NA
1284 collagen type IV alpha 2 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. COL4A2 ENSG00000134871 NA
8522 growth arrest specific 7 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. GAS7 ENSG00000007237 NA
6876 transgelin The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. TAGLN ENSG00000149591 NA
7169 tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2 ENSG00000198467 NA
23022 palladin, cytoskeletal associated protein This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. PALLD ENSG00000129116 NA
23015 golgin A8 family member A The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. GOLGA8A ENSG00000175265 NA
20 ATP binding cassette subfamily A member 2 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. ABCA2 ENSG00000107331 NA
4155 myelin basic protein The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. MBP ENSG00000197971 NA
2026 enolase 2 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme, a homodimer, is found in mature neurons and cells of neuronal origin. A switch from alpha enolase to gamma enolase occurs in neural tissue during development in rats and primates. ENO2 ENSG00000111674 NA
23048 formin binding protein 1 The protein encoded by this gene is a member of the formin-binding-protein family. The protein contains an N-terminal Fer/Cdc42-interacting protein 4 (CIP4) homology (FCH) domain followed by a coiled-coil domain, a proline-rich motif, a second coiled-coil domain, a Rho family protein-binding domain (RBD), and a C-terminal SH3 domain. This protein binds sorting nexin 2 (SNX2), tankyrase (TNKS), and dynamin; an interaction between this protein and formin has not been demonstrated yet in human. FNBP1 ENSG00000187239 NA
140710 suppressor of glucose, autophagy associated 1 NA SOGA1 ENSG00000149639 NA
8497 PTPRF interacting protein alpha 4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PPFIA4 ENSG00000143847 NA
6430 serine and arginine rich splicing factor 5 The protein encoded by this gene is a member of the serine/arginine (SR)-rich family of pre-mRNA splicing factors, which constitute part of the spliceosome. Each of these factors contains an RNA recognition motif (RRM) for binding RNA and an RS domain for binding other proteins. The RS domain is rich in serine and arginine residues and facilitates interaction between different SR splicing factors. In addition to being critical for mRNA splicing, the SR proteins have also been shown to be involved in mRNA export from the nucleus and in translation. Alternative splicing results in multiple transcript variants. SRSF5 ENSG00000100650 NA
10580 sorbin and SH3 domain containing 1 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. SORBS1 ENSG00000095637 NA
7089 transducin like enhancer of split 2 NA TLE2 ENSG00000065717 NA
27129 heat shock protein family B (small) member 7 NA HSPB7 ENSG00000173641 NA
NA NA NA NA ENSG00000256309 TRUE
25932 chloride intracellular channel 4 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 4 (CLIC4) protein, encoded by the CLIC4 gene, is a member of the p64 family; the gene is expressed in many tissues and exhibits a intracellular vesicular pattern in Panc-1 cells (pancreatic cancer cells). CLIC4 ENSG00000169504 NA
1464 chondroitin sulfate proteoglycan 4 A human melanoma-associated chondroitin sulfate proteoglycan plays a role in stabilizing cell-substratum interactions during early events of melanoma cell spreading on endothelial basement membranes. CSPG4 represents an integral membrane chondroitin sulfate proteoglycan expressed by human malignant melanoma cells. CSPG4 ENSG00000173546 NA
87 actinin alpha 1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. ACTN1 ENSG00000072110 NA
11188 nischarin This gene encodes a nonadrenergic imidazoline-1 receptor protein that localizes to the cytosol and anchors to the inner layer of the plasma membrane. The orthologous mouse protein has been shown to influence cytoskeletal organization and cell migration by binding to alpha-5-beta-1 integrin. In humans, this protein has been shown to bind to the adapter insulin receptor substrate 4 (IRS4) to mediate translocation of alpha-5 integrin from the cell membrane to endosomes. Expression of this protein was reduced in human breast cancers while its overexpression reduced tumor growth and metastasis; possibly by limiting the expression of alpha-5 integrin. In human cardiac tissue, this gene was found to affect cell growth and death while in neural tissue it affected neuronal growth and differentiation. Alternative splicing results in multiple transcript variants encoding differerent isoforms. Some isoforms lack the expected C-terminal domains of a functional imidazoline receptor. NISCH ENSG00000010322 NA
25957 PNN interacting serine and arginine rich protein NA PNISR ENSG00000132424 NA
10645 calcium/calmodulin-dependent protein kinase kinase 2 The product of this gene belongs to the Serine/Threonine protein kinase family, and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. The major isoform of this gene plays a role in the calcium/calmodulin-dependent (CaM) kinase cascade by phosphorylating the downstream kinases CaMK1 and CaMK4. Protein products of this gene also phosphorylate AMP-activated protein kinase (AMPK). This gene has its strongest expression in the brain and influences signalling cascades involved with learning and memory, neuronal differentiation and migration, neurite outgrowth, and synapse formation. Alternative splicing results in multiple transcript variants encoding distinct isoforms. The identified isoforms differ in their ability to undergo autophosphorylation and to phosphorylate downstream kinases. CAMKK2 ENSG00000110931 NA
64787 EPS8 like 2 This gene encodes a member of the EPS8 gene family. The encoded protein, like other members of the family, is thought to link growth factor stimulation to actin organization, generating functional redundancy in the pathways that regulate actin cytoskeletal remodeling. EPS8L2 ENSG00000177106 NA
81669 cyclin L2 The protein encoded by this gene belongs to the cyclin family. Through its interaction with several proteins, such as RNA polymerase II, splicing factors, and cyclin-dependent kinases, this protein functions as a regulator of the pre-mRNA splicing process, as well as in inducing apoptosis by modulating the expression of apoptotic and antiapoptotic proteins. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. CCNL2 ENSG00000221978 NA
7038 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. TG ENSG00000042832 NA
3678 integrin subunit alpha 5 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. ITGA5 ENSG00000161638 NA
1476 cystatin B The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). CSTB ENSG00000160213 NA
9145 synaptogyrin 1 This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. SYNGR1 ENSG00000100321 NA
NA NA NA NA ENSG00000163486 TRUE
4629 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 ENSG00000133392 NA
9651 phospholipase C eta 2 PLCH2 is a member of the PLC-eta family of the phosphoinositide-specific phospholipase C (PLC) superfamily of enzymes that cleave PtdIns(4,5) P2 to generate second messengers inositol 1,4,5-trisphosphate and diacylglycerol (Zhou et al., 2005 [PubMed 16107206]). PLCH2 ENSG00000149527 NA
3043 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB ENSG00000244734 NA
9315 neuronal regeneration related protein NA NREP ENSG00000134986 NA
5900 ral guanine nucleotide dissociation stimulator Guanine nucleotide dissociation stimulators (GDSs, or exchange factors), such as RALGDS, are effectors of Ras-related GTPases (see MIM 190020) that participate in signaling for a variety of cellular processes. RALGDS ENSG00000160271 NA
3688 integrin subunit beta 1 Integrins are heterodimeric proteins made up of alpha and beta subunits. At least 18 alpha and 8 beta subunits have been described in mammals. Integrin family members are membrane receptors involved in cell adhesion and recognition in a variety of processes including embryogenesis, hemostasis, tissue repair, immune response and metastatic diffusion of tumor cells. This gene encodes a beta subunit. Multiple alternatively spliced transcript variants which encode different protein isoforms have been found for this gene. ITGB1 ENSG00000150093 NA
11346 synaptopodin Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). SYNPO ENSG00000171992 NA
5054 serpin family E member 1 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. SERPINE1 ENSG00000106366 NA
1153 cold inducible RNA binding protein NA CIRBP ENSG00000099622 NA
476 ATPase Na+/K+ transporting subunit alpha 1 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 1 subunit. Multiple transcript variants encoding different isoforms have been found for this gene. ATP1A1 ENSG00000163399 NA
8639 amine oxidase, copper containing 3 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. AOC3 ENSG00000131471 NA
2289 FK506 binding protein 5 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. FKBP5 ENSG00000096060 NA
7074 T-cell lymphoma invasion and metastasis 1 NA TIAM1 ENSG00000156299 NA
283234 coiled-coil domain containing 88B This gene encodes a member of the hook-related protein family. Members of this family are characterized by an N-terminal potential microtubule binding domain, a central coiled-coiled and a C-terminal Hook-related domain. The encoded protein may be involved in linking organelles to microtubules. CCDC88B ENSG00000168071 NA
57185 NIPA like domain containing 3 NA NIPAL3 ENSG00000001461 NA
9510 ADAM metallopeptidase with thrombospondin type 1 motif 1 This gene encodes a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motif) protein family. Members of the family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The protein encoded by this gene contains two disintegrin loops and three C-terminal TS motifs and has anti-angiogenic activity. The expression of this gene may be associated with various inflammatory processes as well as development of cancer cachexia. This gene is likely to be necessary for normal growth, fertility, and organ morphology and function. ADAMTS1 ENSG00000154734 NA
23654 plexin B2 Members of the B class of plexins, such as PLXNB2 are transmembrane receptors that participate in axon guidance and cell migration in response to semaphorins (Perrot et al. (2002) [PubMed 12183458]). PLXNB2 ENSG00000196576 NA
2878 glutathione peroxidase 3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. GPX3 ENSG00000211445 NA
283450 HECT domain E3 ubiquitin protein ligase 4 NA HECTD4 ENSG00000173064 NA
5166 pyruvate dehydrogenase kinase 4 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. PDK4 ENSG00000004799 NA
1294 collagen type VII alpha 1 This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. COL7A1 ENSG00000114270 NA
112476 proline rich transmembrane protein 2 This gene encodes a transmembrane protein containing a proline-rich domain in its N-terminal half. Studies in mice suggest that it is predominantly expressed in brain and spinal cord in embryonic and postnatal stages. Mutations in this gene are associated with episodic kinesigenic dyskinesia-1. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. PRRT2 ENSG00000167371 NA
266727 MAM domain containing glycosylphosphatidylinositol anchor 1 NA MDGA1 ENSG00000112139 NA
1490 connective tissue growth factor The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. CTGF ENSG00000118523 NA
3371 tenascin C This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. TNC ENSG00000041982 NA
23263 MCF.2 cell line derived transforming sequence like This gene encodes a guanine nucleotide exchange factor that interacts specifically with the GTP-bound Rac1 and plays a role in the Rho/Rac signaling pathways. A variant in this gene was associated with osteoarthritis. Alternative splicing results in multiple transcript variants. MCF2L ENSG00000126217 NA
800 caldesmon 1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. CALD1 ENSG00000122786 NA
3866 keratin 15 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. KRT15 ENSG00000171346 NA
7077 TIMP metallopeptidase inhibitor 2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP2 ENSG00000035862 NA
4162 melanoma cell adhesion molecule NA MCAM ENSG00000076706 NA
55636 chromodomain helicase DNA binding protein 7 This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. CHD7 ENSG00000171316 NA
57449 pleckstrin homology and RhoGEF domain containing G5 This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. PLEKHG5 ENSG00000171680 NA
130733 transmembrane protein 178A NA TMEM178A ENSG00000152154 NA
23162 mitogen-activated protein kinase 8 interacting protein 3 The protein encoded by this gene shares similarity with the product of Drosophila syd gene, required for the functional interaction of kinesin I with axonal cargo. Studies of the similar gene in mouse suggested that this protein may interact with, and regulate the activity of numerous protein kinases of the JNK signaling pathway, and thus function as a scaffold protein in neuronal cells. The C. elegans counterpart of this gene is found to regulate synaptic vesicle transport possibly by integrating JNK signaling and kinesin-1 transport. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. MAPK8IP3 ENSG00000138834 NA
123 perilipin 2 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. PLIN2 ENSG00000147872 NA
10516 fibulin 5 The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). FBLN5 ENSG00000140092 NA
7414 vinculin Vinculin is a cytoskeletal protein associated with cell-cell and cell-matrix junctions, where it is thought to function as one of several interacting proteins involved in anchoring F-actin to the membrane. Defects in VCL are the cause of cardiomyopathy dilated type 1W. Dilated cardiomyopathy is a disorder characterized by ventricular dilation and impaired systolic function, resulting in congestive heart failure and arrhythmia. Multiple alternatively spliced transcript variants have been found for this gene, but the biological validity of some variants has not been determined. VCL ENSG00000035403 NA
126393 heat shock protein family B (small) member 6 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation. HSPB6 ENSG00000004776 NA
6812 syntaxin binding protein 1 This gene encodes a syntaxin-binding protein. The encoded protein appears to play a role in release of neurotransmitters via regulation of syntaxin, a transmembrane attachment protein receptor. Mutations in this gene have been associated with infantile epileptic encephalopathy-4. Alternatively spliced transcript variants have been described. STXBP1 ENSG00000136854 NA
729359 perilipin 4 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). PLIN4 ENSG00000167676 NA
10472 zinc finger and BTB domain containing 18 This gene encodes a C2H2-type zinc finger protein which acts a transcriptional repressor of genes involved in neuronal development. The encoded protein recognizes a specific sequence motif and recruits components of chromatin to target genes. Alternative splicing results in multiple transcript variants. ZBTB18 ENSG00000179456 NA
23254 kazrin, periplakin interacting protein This gene encodes a protein that plays a role in desmosome assembly, cell adhesion, cytoskeletal organization, and epidermal differentiation. This protein co-localizes with desmoplakin and the cytolinker protein periplakin. In general, this protein localizes to the nucleus, desmosomes, cell membrane, and cortical actin-based structures. Some isoforms of this protein also associate with microtubules. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional splice variants have been described but their biological validity has not been verified. KAZN ENSG00000189337 NA
100507347 VIM antisense RNA 1 NA VIM-AS1 ENSG00000229124 NA
25989 unc-51 like kinase 3 NA ULK3 ENSG00000140474 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
FTL ENSG00000087086 This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ferritin, light polypeptide 2512 NA
KRT13 ENSG00000171401 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 3860 NA
RGS5 ENSG00000143248 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. regulator of G-protein signaling 5 8490 NA
GPX3 ENSG00000211445 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 2878 NA
NA ENSG00000117289 NA NA NA TRUE
FTH1 ENSG00000167996 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. ferritin heavy chain 1 2495 NA
ACTA2 ENSG00000107796 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta 59 NA
KRT4 ENSG00000170477 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 3851 NA
PKD1 ENSG00000008710 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. polycystin 1, transient receptor potential channel interacting 5310 NA
B2M ENSG00000166710 This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. beta-2-microglobulin 567 NA
TPM2 ENSG00000198467 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) 7169 NA
MCAM ENSG00000076706 NA melanoma cell adhesion molecule 4162 NA
KRT10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 3858 NA
SPRR3 ENSG00000163209 NA small proline rich protein 3 6707 NA
ITGA8 ENSG00000077943 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. integrin subunit alpha 8 8516 NA
ELN ENSG00000049540 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. elastin 2006 NA
NOTCH3 ENSG00000074181 This gene encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remains to be determined. Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). notch 3 4854 NA
LAMB1 ENSG00000091136 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. laminin subunit beta 1 3912 NA
THBS1 ENSG00000137801 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. thrombospondin 1 7057 NA
HBB ENSG00000244734 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta 3043 NA
HLA-E ENSG00000204592 HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. major histocompatibility complex, class I, E 3133 NA
AC019349.5 ENSG00000229732 NA NA ENSG00000229732 NA
ARHGEF10L ENSG00000074964 ARHGEF10L is a member of the RhoGEF family of guanine nucleotide exchange factors (GEFs) that activate Rho GTPases (Winkler et al., 2005 [PubMed 16112081]). Rho guanine nucleotide exchange factor 10 like 55160 NA
SPTBN1 ENSG00000115306 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. spectrin beta, non-erythrocytic 1 6711 NA
FKBP8 ENSG00000105701 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. Unlike the other members of the family, this encoded protein does not seem to have PPIase/rotamase activity. It may have a role in neurons associated with memory function. FK506 binding protein 8 23770 NA
NOV ENSG00000136999 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. nephroblastoma overexpressed 4856 NA
MYH11 ENSG00000133392 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle 4629 NA
ACTA2-AS1 ENSG00000180139 NA ACTA2 antisense RNA 1 ENSG00000180139 NA
LYZ ENSG00000090382 This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. lysozyme 4069 NA
SLC11A1 ENSG00000018280 This gene is a member of the solute carrier family 11 (proton-coupled divalent metal ion transporters) family and encodes a multi-pass membrane protein. The protein functions as a divalent transition metal (iron and manganese) transporter involved in iron metabolism and host resistance to certain pathogens. Mutations in this gene have been associated with susceptibility to infectious diseases such as tuberculosis and leprosy, and inflammatory diseases such as rheumatoid arthritis and Crohn disease. Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only one has been determined. solute carrier family 11 member 1 6556 NA
DCN ENSG00000011465 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin 1634 NA
COL6A3 ENSG00000163359 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. collagen type VI alpha 3 chain 1293 NA
COL27A1 ENSG00000196739 This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. collagen type XXVII alpha 1 85301 NA
FASN ENSG00000169710 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase 2194 NA
ACTB ENSG00000075624 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta 60 NA
PTGDS ENSG00000107317 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. prostaglandin D2 synthase 5730 NA
CSTB ENSG00000160213 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). cystatin B 1476 NA
ACTG2 ENSG00000163017 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric 72 NA
LOC100129518 ENSG00000112096 NA uncharacterized LOC100129518 100129518 NA
SOD2 ENSG00000112096 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial 6648 NA
ACACB ENSG00000076555 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta 32 NA
COL18A1 ENSG00000182871 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. collagen type XVIII alpha 1 chain 80781 NA
ACSL1 ENSG00000151726 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. acyl-CoA synthetase long-chain family member 1 2180 NA
TIMP3 ENSG00000100234 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. TIMP metallopeptidase inhibitor 3 7078 NA
MYLK ENSG00000065534 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. myosin light chain kinase 4638 NA
PPFIA4 ENSG00000143847 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PTPRF interacting protein alpha 4 8497 NA
MGP ENSG00000111341 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. matrix Gla protein 4256 NA
MYO1D ENSG00000176658 NA myosin ID 4642 NA
GPX1 ENSG00000233276 This gene encodes a member of the glutathione peroxidase family. Glutathione peroxidase functions in the detoxification of hydrogen peroxide, and is one of the most important antioxidant enzymes in humans. This protein is one of only a few proteins known in higher vertebrates to contain selenocysteine, which occurs at the active site of glutathione peroxidase and is coded by UGA, that normally functions as a translation termination codon. In addition, this protein is characterized in a polyalanine sequence polymorphism in the N-terminal region, which includes three alleles with five, six or seven alanine (ALA) repeats in this sequence. The allele with five ALA repeats is significantly associated with breast cancer risk. Two alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. glutathione peroxidase 1 2876 NA
OAZ1 ENSG00000104904 The protein encoded by this gene belongs to the ornithine decarboxylase antizyme family, which plays a role in cell growth and proliferation by regulating intracellular polyamine levels. Expression of antizymes requires +1 ribosomal frameshifting, which is enhanced by high levels of polyamines. Antizymes in turn bind to and inhibit ornithine decarboxylase (ODC), the key enzyme in polyamine biosynthesis; thus, completing the auto-regulatory circuit. This gene encodes antizyme 1, the first member of the antizyme family, that has broad tissue distribution, and negatively regulates intracellular polyamine levels by binding to and targeting ODC for degradation, as well as inhibiting polyamine uptake. Antizyme 1 mRNA contains two potential in-frame AUGs; and studies in rat suggest that alternative use of the two translation initiation sites results in N-terminally distinct protein isoforms with different subcellular localization. Alternatively spliced transcript variants have also been noted for this gene. ornithine decarboxylase antizyme 1 4946 NA
HSPB1 ENSG00000106211 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). heat shock protein family B (small) member 1 3315 NA
MICAL2 ENSG00000133816 NA microtubule associated monooxygenase, calponin and LIM domain containing 2 9645 NA
ANPEP ENSG00000166825 Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. alanyl aminopeptidase, membrane 290 NA
MDGA1 ENSG00000112139 NA MAM domain containing glycosylphosphatidylinositol anchor 1 266727 NA
ECM1 ENSG00000143369 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. extracellular matrix protein 1 1893 NA
JAG1 ENSG00000101384 The jagged 1 protein encoded by JAG1 is the human homolog of the Drosophilia jagged protein. Human jagged 1 is the ligand for the receptor notch 1, the latter a human homolog of the Drosophilia jagged receptor notch. Mutations that alter the jagged 1 protein cause Alagille syndrome. Jagged 1 signalling through notch 1 has also been shown to play a role in hematopoiesis. jagged 1 182 NA
S100A16 ENSG00000188643 NA S100 calcium binding protein A16 140576 NA
C1S ENSG00000182326 This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. complement component 1, s subcomponent 716 NA
AEBP1 ENSG00000106624 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1 165 NA
SCD ENSG00000099194 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase 6319 NA
FRZB ENSG00000162998 The protein encoded by this gene is a secreted protein that is involved in the regulation of bone development. Defects in this gene are a cause of female-specific osteoarthritis (OA) susceptibility. frizzled-related protein 2487 NA
TIAM1 ENSG00000156299 NA T-cell lymphoma invasion and metastasis 1 7074 NA
MAP1B ENSG00000131711 This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1B heavy chain and LC1 light chain. Gene knockout studies of the mouse microtubule-associated protein 1B gene suggested an important role in development and function of the nervous system. microtubule associated protein 1B 4131 NA
ACTN4 ENSG00000130402 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. actinin alpha 4 81 NA
ANXA5 ENSG00000164111 The protein encoded by this gene belongs to the annexin family of calcium-dependent phospholipid binding proteins some of which have been implicated in membrane-related events along exocytotic and endocytotic pathways. Annexin 5 is a phospholipase A2 and protein kinase C inhibitory protein with calcium channel activity and a potential role in cellular signal transduction, inflammation, growth and differentiation. Annexin 5 has also been described as placental anticoagulant protein I, vascular anticoagulant-alpha, endonexin II, lipocortin V, placental protein 4 and anchorin CII. The gene spans 29 kb containing 13 exons, and encodes a single transcript of approximately 1.6 kb and a protein product with a molecular weight of about 35 kDa. annexin A5 308 NA
CRNN ENSG00000143536 This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. cornulin 49860 NA
RNASE1 ENSG00000129538 This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. ribonuclease A family member 1, pancreatic 6035 NA
CD9 ENSG00000010278 This gene encodes a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Tetraspanins are cell surface glycoproteins with four transmembrane domains that form multimeric complexes with other cell surface proteins. The encoded protein functions in many cellular processes including differentiation, adhesion, and signal transduction, and expression of this gene plays a critical role in the suppression of cancer cell motility and metastasis. CD9 molecule 928 NA
CRYAB ENSG00000109846 Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. crystallin alpha B 1410 NA
LPL ENSG00000175445 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. lipoprotein lipase 4023 NA
SOGA1 ENSG00000149639 NA suppressor of glucose, autophagy associated 1 140710 NA
PDK4 ENSG00000004799 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. pyruvate dehydrogenase kinase 4 5166 NA
MYH10 ENSG00000133026 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. myosin, heavy chain 10, non-muscle 4628 NA
GFAP ENSG00000131095 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein 2670 NA
GYPC ENSG00000136732 Glycophorin C (GYPC) is an integral membrane glycoprotein. It is a minor species carried by human erythrocytes, but plays an important role in regulating the mechanical stability of red cells. A number of glycophorin C mutations have been described. The Gerbich and Yus phenotypes are due to deletion of exon 3 and 2, respectively. The Webb and Duch antigens, also known as glycophorin D, result from single point mutations of the glycophorin C gene. The glycophorin C protein has very little homology with glycophorins A and B. Alternate splicing results in multiple transcript variants. glycophorin C (Gerbich blood group) 2995 NA
CFD ENSG00000197766 This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. complement factor D 1675 NA
IGFBP4 ENSG00000141753 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. insulin like growth factor binding protein 4 3487 NA
C1R ENSG00000159403 NA complement C1r subcomponent 715 NA
FMNL1 ENSG00000184922 This gene encodes a formin-related protein. Formin-related proteins have been implicated in morphogenesis, cytokinesis, and cell polarity. An alternative splice variant has been described but its full length sequence has not been determined. formin like 1 752 NA
MYL9 ENSG00000101335 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. myosin light chain 9 10398 NA
NECTIN1 ENSG00000110400 This gene encodes an adhesion protein that plays a role in the organization of adherens junctions and tight junctions in epithelial and endothelial cells. The protein is a calcium(2+)-independent cell-cell adhesion molecule that belongs to the immunoglobulin superfamily and has 3 extracellular immunoglobulin-like loops, a single transmembrane domain (in some isoforms), and a cytoplasmic region. This protein acts as a receptor for glycoprotein D (gD) of herpes simplex viruses 1 and 2 (HSV-1, HSV-2), and pseudorabies virus (PRV) and mediates viral entry into epithelial and neuronal cells. Mutations in this gene cause cleft lip and palate/ectodermal dysplasia 1 syndrome (CLPED1) as well as non-syndromic cleft lip with or without cleft palate (CL/P). Alternative splicing results in multiple transcript variants encoding proteins with distinct C-termini. nectin cell adhesion molecule 1 5818 NA
DENND3 ENSG00000105339 NA DENN domain containing 3 22898 NA
PODXL ENSG00000128567 This gene encodes a member of the sialomucin protein family. The encoded protein was originally identified as an important component of glomerular podocytes. Podocytes are highly differentiated epithelial cells with interdigitating foot processes covering the outer aspect of the glomerular basement membrane. Other biological activities of the encoded protein include: binding in a membrane protein complex with Na+/H+ exchanger regulatory factor to intracellular cytoskeletal elements, playing a role in hematopoetic cell differentiation, and being expressed in vascular endothelium cells and binding to L-selectin. podocalyxin like 5420 NA
LAMA5 ENSG00000130702 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). laminin subunit alpha 5 3911 NA
NA ENSG00000259716 NA NA NA TRUE
KRT14 ENSG00000186847 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 3861 NA
ACTA1 ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. actin, alpha 1, skeletal muscle 58 NA
CYBA ENSG00000051523 Cytochrome b is comprised of a light chain (alpha) and a heavy chain (beta). This gene encodes the light, alpha subunit which has been proposed as a primary component of the microbicidal oxidase system of phagocytes. Mutations in this gene are associated with autosomal recessive chronic granulomatous disease (CGD), that is characterized by the failure of activated phagocytes to generate superoxide, which is important for the microbicidal activity of these cells. cytochrome b-245 alpha chain 1535 NA
FAM129B ENSG00000136830 NA family with sequence similarity 129 member B 64855 NA
ADH1B ENSG00000196616 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide 125 NA
PRUNE2 ENSG00000106772 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. prune homolog 2 158471 NA
NPPA ENSG00000175206 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. natriuretic peptide A 4878 NA
SKI ENSG00000157933 This gene encodes the nuclear protooncogene protein homolog of avian sarcoma viral (v-ski) oncogene. It functions as a repressor of TGF-beta signaling, and may play a role in neural tube development and muscle differentiation. SKI proto-oncogene 6497 NA
FBLIM1 ENSG00000162458 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. filamin binding LIM protein 1 54751 NA
LUM ENSG00000139329 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. lumican 4060 NA
PDLIM3 ENSG00000154553 The protein encoded by this gene contains a PDZ domain and a LIM domain, indicating that it may be involved in cytoskeletal assembly. In support of this, the encoded protein has been shown to bind the spectrin-like repeats of alpha-actinin-2 and to colocalize with alpha-actinin-2 at the Z lines of skeletal muscle. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. Aberrant alternative splicing of this gene may play a role in myotonic dystrophy. PDZ and LIM domain 3 27295 NA
AHDC1 ENSG00000126705 This gene encodes a protein containing two AT-hooks, which likely function in DNA binding. Mutations in this gene were found in individuals with Xia-Gibbs syndrome. AT-hook DNA binding motif containing 1 27245 NA
AKNA ENSG00000106948 NA AT-hook transcription factor 80709 NA
IFITM1 ENSG00000185885 NA interferon induced transmembrane protein 1 8519 NA
PLXND1 ENSG00000004399 NA plexin D1 23129 NA
C3 ENSG00000125730 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. complement component 3 718 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name
ENSG00000143632 ACTA1 58 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. actin, alpha 1, skeletal muscle
ENSG00000104879 CKM 1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type
ENSG00000155657 TTN 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin
ENSG00000183091 NEB 4703 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. nebulin
ENSG00000196091 MYBPC1 4604 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. myosin binding protein C, slow type
ENSG00000092054 MYH7 4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta
ENSG00000109061 MYH1 4619 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. myosin, heavy chain 1, skeletal muscle, adult
ENSG00000111245 MYL2 4633 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 2
ENSG00000125414 MYH2 4620 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. myosin, heavy chain 2, skeletal muscle, adult
ENSG00000105048 TNNT1 7138 This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. troponin T1, slow skeletal type
ENSG00000075624 ACTB 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta
ENSG00000175206 NPPA 4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. natriuretic peptide A
ENSG00000101470 TNNC2 7125 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. troponin C2, fast skeletal type
ENSG00000068976 PYGM 5837 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. phosphorylase, glycogen, muscle
ENSG00000159173 TNNI1 7135 Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. troponin I1, slow skeletal type
ENSG00000196296 ATP2A1 487 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1
ENSG00000196218 RYR1 6261 This gene encodes a ryanodine receptor found in skeletal muscle. The encoded protein functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule. Mutations in this gene are associated with malignant hyperthermia susceptibility, central core disease, and minicore myopathy with external ophthalmoplegia. Alternatively spliced transcripts encoding different isoforms have been described. ryanodine receptor 1
ENSG00000164309 CMYA5 202333 NA cardiomyopathy associated 5
ENSG00000168530 MYL1 4632 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. myosin light chain 1
ENSG00000244734 HBB 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta
ENSG00000164879 CA3 761 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. carbonic anhydrase 3
ENSG00000198125 MB 4151 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. myoglobin
ENSG00000130595 TNNT3 7140 The binding of Ca(2+) to the trimeric troponin complex initiates the process of muscle contraction. Increased Ca(2+) concentrations produce a conformational change in the troponin complex that is transmitted to tropomyosin dimers situated along actin filaments. The altered conformation permits increased interaction between a myosin head and an actin filament which, ultimately, produces a muscle contraction. The troponin complex has protein subunits C, I, and T. Subunit C binds Ca(2+) and subunit I binds to actin and inhibits actin-myosin interaction. Subunit T binds the troponin complex to the tropomyosin complex and is also required for Ca(2+)-mediated activation of actomyosin ATPase activity. There are 3 different troponin T genes that encode tissue-specific isoforms of subunit T for fast skeletal-, slow skeletal-, and cardiac-muscle. This gene encodes fast skeletal troponin T protein; also known as troponin T type 3. Alternative splicing results in multiple transcript variants encoding additional distinct troponin T type 3 isoforms. A developmentally regulated switch between fetal/neonatal and adult troponin T type 3 isoforms occurs. Additional splice variants have been described but their biological validity has not been established. Mutations in this gene may cause distal arthrogryposis multiplex congenita type 2B (DA2B). troponin T3, fast skeletal type
ENSG00000237298 LOC101927055 101927055 NA uncharacterized LOC101927055
ENSG00000237298 TTN-AS1 100506866 NA TTN antisense RNA 1
ENSG00000154358 OBSCN 84033 The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF
ENSG00000156508 EEF1A1 1915 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. eukaryotic translation elongation factor 1 alpha 1
ENSG00000197893 NRAP 4892 NA nebulin related anchoring protein
ENSG00000108515 ENO3 2027 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. enolase 3
ENSG00000086967 MYBPC2 4606 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. myosin binding protein C, fast type
ENSG00000239474 KLHL41 10324 This gene is a member of the kelch-like family. The encoded protein contains a BACK domain, a BTB/POZ domain, and 5 Kelch repeats. This protein is thought to function in skeletal muscle development and maintenance. Mutations in this gene have been associated with nemaline myopathy (NM), a rare congenital muscle disorder. kelch like family member 41
ENSG00000122304 PRM2 5620 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. protamine 2
ENSG00000197616 MYH6 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha
ENSG00000130598 TNNI2 7136 This gene encodes a fast-twitch skeletal muscle protein, a member of the troponin I gene family, and a component of the troponin complex including troponin T, troponin C and troponin I subunits. The troponin complex, along with tropomyosin, is responsible for the calcium-dependent regulation of striated muscle contraction. Mouse studies show that this component is also present in vascular smooth muscle and may play a role in regulation of smooth muscle function. In addition to muscle tissues, this protein is found in corneal epithelium, cartilage where it is an inhibitor of angiogenesis to inhibit tumor growth and metastasis, and mammary gland where it functions as a co-activator of estrogen receptor-related receptor alpha. This protein also suppresses tumor growth in human ovarian carcinoma. Mutations in this gene cause myopathy and distal arthrogryposis type 2B. Alternatively spliced transcript variants have been found for this gene. troponin I2, fast skeletal type
ENSG00000180209 MYLPF 29895 NA myosin light chain, phosphorylatable, fast skeletal muscle
ENSG00000242349 NPPA-AS1 ENSG00000242349 NA NPPA antisense RNA 1
ENSG00000070756 PABPC1 26986 This gene encodes a poly(A) binding protein. The protein shuttles between the nucleus and cytoplasm and binds to the 3’ poly(A) tail of eukaryotic messenger RNAs via RNA-recognition motifs. The binding of this protein to poly(A) promotes ribosome recruitment and translation initiation; it is also required for poly(A) shortening which is the first step in mRNA decay. The gene is part of a small gene family including three protein-coding genes and several pseudogenes. poly(A) binding protein cytoplasmic 1
ENSG00000100345 MYH9 4627 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. myosin, heavy chain 9, non-muscle
ENSG00000106631 MYL7 58498 NA myosin light chain 7
ENSG00000114854 TNNC1 7134 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. troponin C1, slow skeletal and cardiac type
ENSG00000132475 H3F3B 3021 Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene contains introns and its mRNA is polyadenylated, unlike most histone genes. The protein encoded by this gene is a replication-independent histone that is a member of the histone H3 family. Pseudogenes of this gene have been identified on the X chromosome, and on chromosomes 5, 13 and 17. H3 histone, family 3B (H3.3B)
ENSG00000101210 EEF1A2 1917 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. eukaryotic translation elongation factor 1 alpha 2
ENSG00000177791 MYOZ1 58529 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. myozenin 1
ENSG00000196205 EEF1A1P5 ENSG00000196205 NA eukaryotic translation elongation factor 1 alpha 1 pseudogene 5
ENSG00000143549 TPM3 7170 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. tropomyosin 3
ENSG00000175646 PRM1 5619 NA protamine 1
ENSG00000100316 RPL3 6122 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein L3
ENSG00000188536 HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 2
ENSG00000108654 DDX5 1655 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure, such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is a RNA-dependent ATPase, and also a proliferation-associated nuclear antigen, specifically reacting with the simian virus 40 tumor antigen. Alternative splicing results in multiple transcript variants. DEAD-box helicase 5
ENSG00000112096 LOC100129518 100129518 NA uncharacterized LOC100129518
ENSG00000112096 SOD2 6648 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial
ENSG00000120129 DUSP1 1843 The expression of DUSP1 gene is induced in human skin fibroblasts by oxidative/heat stress and growth factors. It specifies a protein with structural features similar to members of the non-receptor-type protein-tyrosine phosphatase family, and which has significant amino-acid sequence similarity to a Tyr/Ser-protein phosphatase encoded by the late gene H1 of vaccinia virus. The bacterially expressed and purified DUSP1 protein has intrinsic phosphatase activity, and specifically inactivates mitogen-activated protein (MAP) kinase in vitro by the concomitant dephosphorylation of both its phosphothreonine and phosphotyrosine residues. Furthermore, it suppresses the activation of MAP kinase by oncogenic ras in extracts of Xenopus oocytes. Thus, DUSP1 may play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. dual specificity phosphatase 1
ENSG00000123384 LRP1 4035 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. LDL receptor related protein 1
ENSG00000167460 TPM4 7171 This gene encodes a member of the tropomyosin family of actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosins are dimers of coiled-coil proteins that polymerize end-to-end along the major groove in most actin filaments. They provide stability to the filaments and regulate access of other actin-binding proteins. In muscle cells, they regulate muscle contraction by controlling the binding of myosin heads to the actin filament. Multiple transcript variants encoding different isoforms have been found for this gene. tropomyosin 4
ENSG00000159251 ACTC1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). actin, alpha, cardiac muscle 1
ENSG00000077522 ACTN2 88 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. actinin alpha 2
ENSG00000120729 MYOT 9499 This gene encodes a cystoskeletal protein which plays a significant role in the stability of thin filaments during muscle contraction. This protein binds F-actin, crosslinks actin filaments, and prevents latrunculin A-induced filament disassembly. Mutations in this gene have been associated with limb-girdle muscular dystrophy and myofibrillar myopathies. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. myotilin
ENSG00000143318 CASQ1 844 This gene encodes the skeletal muscle specific member of the calsequestrin protein family. Calsequestrin functions as a luminal sarcoplasmic reticulum calcium sensor in both cardiac and skeletal muscle cells. This protein, also known as calmitine, functions as a calcium regulator in the mitochondria of skeletal muscle. This protein is absent in patients with Duchenne and Becker types of muscular dystrophy. calsequestrin 1
ENSG00000136717 BIN1 274 This gene encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynamin, synaptojanin, endophilin, and clathrin. Isoforms that are expressed in muscle and ubiquitously expressed isoforms localize to the cytoplasm and nucleus and activate a caspase-independent apoptotic process. Studies in mouse suggest that this gene plays an important role in cardiac muscle development. Alternate splicing of the gene results in several transcript variants encoding different isoforms. Aberrant splice variants expressed in tumor cell lines have also been described. bridging integrator 1
ENSG00000042832 TG 7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin
ENSG00000173991 TCAP 8557 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. titin-cap
ENSG00000185482 STAC3 246329 The protein encoded by this gene is a component of the excitation-contraction coupling machinery of muscles. This protein is a member of the Stac gene family and contains an N-terminal cysteine-rich domain and two SH3 domains. Mutations in this gene are a cause of Native American myopathy. SH3 and cysteine rich domain 3
ENSG00000149925 ALDOA 226 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. aldolase, fructose-bisphosphate A
ENSG00000092841 MYL6 4637 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. myosin light chain 6
ENSG00000211445 GPX3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3
ENSG00000010318 PHF7 51533 Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. PHD finger protein 7
ENSG00000084234 APLP2 334 This gene encodes amyloid precursor- like protein 2 (APLP2), which is a member of the APP (amyloid precursor protein) family including APP, APLP1 and APLP2. This protein is ubiquitously expressed. It contains heparin-, copper- and zinc- binding domains at the N-terminus, BPTI/Kunitz inhibitor and E2 domains in the middle region, and transmembrane and intracellular domains at the C-terminus. This protein interacts with major histocompatibility complex (MHC) class I molecules. The synergy of this protein and the APP is required to mediate neuromuscular transmission, spatial learning and synaptic plasticity. This protein has been implicated in the pathogenesis of Alzheimer’s disease. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. amyloid beta precursor like protein 2
ENSG00000118194 TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. troponin T2, cardiac type
ENSG00000128591 FLNC 2318 This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. filamin C
ENSG00000158022 TRIM63 84676 This gene encodes a member of the RING zinc finger protein family found in striated muscle and iris. The product of this gene is an E3 ubiquitin ligase that localizes to the Z-line and M-line lattices of myofibrils. This protein plays an important role in the atrophy of skeletal and cardiac muscle and is required for the degradation of myosin heavy chain proteins, myosin light chain, myosin binding protein, and for muscle-type creatine kinase. tripartite motif containing 63
ENSG00000136156 ITM2B 9445 Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. integral membrane protein 2B
ENSG00000110651 CD81 975 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. This protein appears to promote muscle cell fusion and support myotube maintenance. Also it may be involved in signal transduction. This gene is localized in the tumor-suppressor gene region and thus it is a candidate gene for malignancies. Two transcript variants encoding different isoforms have been found for this gene. CD81 molecule
ENSG00000187514 PTMA 5757 NA prothymosin, alpha
ENSG00000187514 LOC728026 728026 NA prothymosin alpha-like
ENSG00000127603 KIAA0754 643314 NA KIAA0754
ENSG00000127603 MACF1 23499 This gene encodes a large protein containing numerous spectrin and leucine-rich repeat (LRR) domains. The encoded protein is a member of a family of proteins that form bridges between different cytoskeletal elements. This protein facilitates actin-microtubule interactions at the cell periphery and couples the microtubule network to cellular junctions. Alternative splicing results in multiple transcript variants, but the full-length nature of some of these variants has not been determined. microtubule-actin crosslinking factor 1
ENSG00000110321 EIF4G2 1982 Translation initiation is mediated by specific recognition of the cap structure by eukaryotic translation initiation factor 4F (eIF4F), which is a cap binding protein complex that consists of three subunits: eIF4A, eIF4E and eIF4G. The protein encoded by this gene shares similarity with the C-terminal region of eIF4G that contains the binding sites for eIF4A and eIF3; eIF4G, in addition, contains a binding site for eIF4E at the N-terminus. Unlike eIF4G, which supports cap-dependent and independent translation, this gene product functions as a general repressor of translation by forming translationally inactive complexes. In vitro and in vivo studies indicate that translation of this mRNA initiates exclusively at a non-AUG (GUG) codon. Alternatively spliced transcript variants encoding different isoforms of this gene have been described. eukaryotic translation initiation factor 4 gamma 2
ENSG00000183386 FHL3 2275 The protein encoded by this gene is a member of a family of proteins containing a four-and-a-half LIM domain, which is a highly conserved double zinc finger motif. The encoded protein has been shown to interact with the cancer developmental regulators SMAD2, SMAD3, and SMAD4, the skeletal muscle myogenesis protein MyoD, and the high-affinity IgE beta chain regulator MZF-1. This protein may be involved in tumor suppression, repression of MyoD expression, and repression of IgE receptor expression. Two transcript variants encoding different isoforms have been found for this gene. four and a half LIM domains 3
ENSG00000067560 RHOA 387 This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. Overexpression of this gene is associated with tumor cell proliferation and metastasis. Multiple alternatively spliced variants have been identified. ras homolog family member A
ENSG00000143878 RHOB 388 NA ras homolog family member B
ENSG00000137076 TLN1 7094 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. talin 1
ENSG00000161960 EIF4A1 1973 NA eukaryotic translation initiation factor 4A1
ENSG00000142192 APP 351 This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene. amyloid beta precursor protein
ENSG00000115524 SF3B1 23451 This gene encodes subunit 1 of the splicing factor 3b protein complex. Splicing factor 3b, together with splicing factor 3a and a 12S RNA unit, forms the U2 small nuclear ribonucleoproteins complex (U2 snRNP). The splicing factor 3b/3a complex binds pre-mRNA upstream of the intron’s branch site in a sequence independent manner and may anchor the U2 snRNP to the pre-mRNA. Splicing factor 3b is also a component of the minor U12-type spliceosome. The carboxy-terminal two-thirds of subunit 1 have 22 non-identical, tandem HEAT repeats that form rod-like, helical structures. Alternative splicing results in multiple transcript variants encoding different isoforms. splicing factor 3b subunit 1
ENSG00000185896 LAMP1 3916 The protein encoded by this gene is a member of a family of membrane glycoproteins. This glycoprotein provides selectins with carbohydrate ligands. It may also play a role in tumor cell metastasis. lysosomal associated membrane protein 1
ENSG00000153187 HNRNPU 3192 This gene belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins and they form complexes with heterogeneous nuclear RNA (hnRNA). These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene contains a RNA binding domain and scaffold-associated region (SAR)-specific bipartite DNA-binding domain. This protein is also thought to be involved in the packaging of hnRNA into large ribonucleoprotein complexes. During apoptosis, this protein is cleaved in a caspase-dependent way. Cleavage occurs at the SALD site, resulting in a loss of DNA-binding activity and a concomitant detachment of this protein from nuclear structural sites. But this cleavage does not affect the function of the encoded protein in RNA metabolism. At least two alternatively spliced transcript variants have been identified for this gene. heterogeneous nuclear ribonucleoprotein U
ENSG00000152291 TGOLN2 10618 This gene encodes a type I integral membrane protein that is localized to the trans-Golgi network, a major sorting station for secretory and membrane proteins. The encoded protein cycles between early endosomes and the trans-Golgi network, and may play a role in exocytic vesicle formation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. trans-golgi network protein 2
ENSG00000127022 CANX 821 This gene encodes a member of the calnexin family of molecular chaperones. The encoded protein is a calcium-binding, endoplasmic reticulum (ER)-associated protein that interacts transiently with newly synthesized N-linked glycoproteins, facilitating protein folding and assembly. It may also play a central role in the quality control of protein folding by retaining incorrectly folded protein subunits within the ER for degradation. Alternatively spliced transcript variants encoding the same protein have been described. calnexin
ENSG00000065978 YBX1 4904 This gene encodes a highly conserved cold shock domain protein that has broad nucleic acid binding properties. The encoded protein functions as both a DNA and RNA binding protein and has been implicated in numerous cellular processes including regulation of transcription and translation, pre-mRNA splicing, DNA reparation and mRNA packaging. This protein is also a component of messenger ribonucleoprotein (mRNP) complexes and may have a role in microRNA processing. This protein can be secreted through non-classical pathways and functions as an extracellular mitogen. Aberrant expression of the gene is associated with cancer proliferation in numerous tissues. This gene may be a prognostic marker for poor outcome and drug resistance in certain cancers. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on multiple chromosomes. Y-box binding protein 1
ENSG00000011465 DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin
ENSG00000197694 SPTAN1 6709 Spectrins are a family of filamentous cytoskeletal proteins that function as essential scaffold proteins that stabilize the plasma membrane and organize intracellular organelles. Spectrins are composed of alpha and beta dimers that associate to form tetramers linked in a head-to-head arrangement. This gene encodes an alpha spectrin that is specifically expressed in nonerythrocytic cells. The encoded protein has been implicated in other cellular functions including DNA repair and cell cycle regulation. Mutations in this gene are the cause of early infantile epileptic encephalopathy-5. Alternate splicing results in multiple transcript variants. spectrin alpha, non-erythrocytic 1
ENSG00000118816 CCNI 10983 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin shows the highest similarity with cyclin G. The transcript of this gene was found to be expressed constantly during cell cycle progression. The function of this cyclin has not yet been determined. cyclin I
ENSG00000134571 MYBPC3 4607 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. myosin binding protein C, cardiac
ENSG00000163220 S100A9 6280 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9
ENSG00000175899 A2M 2 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. alpha-2-macroglobulin
ENSG00000124942 AHNAK 79026 NA AHNAK nucleoprotein
ENSG00000198336 MYL4 4635 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. myosin light chain 4
ENSG00000130402 ACTN4 81 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. actinin alpha 4
ENSG00000165119 HNRNPK 3190 This gene belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins and they complex with heterogeneous nuclear RNA (hnRNA). These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene is located in the nucleoplasm and has three repeats of KH domains that binds to RNAs. It is distinct among other hnRNP proteins in its binding preference; it binds tenaciously to poly(C). This protein is also thought to have a role during cell cycle progession. Several alternatively spliced transcript variants have been described for this gene, however, not all of them are fully characterized. heterogeneous nuclear ribonucleoprotein K
ENSG00000240045 LOC100507537 100507537 NA uncharacterized LOC100507537
ENSG00000010322 NISCH 11188 This gene encodes a nonadrenergic imidazoline-1 receptor protein that localizes to the cytosol and anchors to the inner layer of the plasma membrane. The orthologous mouse protein has been shown to influence cytoskeletal organization and cell migration by binding to alpha-5-beta-1 integrin. In humans, this protein has been shown to bind to the adapter insulin receptor substrate 4 (IRS4) to mediate translocation of alpha-5 integrin from the cell membrane to endosomes. Expression of this protein was reduced in human breast cancers while its overexpression reduced tumor growth and metastasis; possibly by limiting the expression of alpha-5 integrin. In human cardiac tissue, this gene was found to affect cell growth and death while in neural tissue it affected neuronal growth and differentiation. Alternative splicing results in multiple transcript variants encoding differerent isoforms. Some isoforms lack the expected C-terminal domains of a functional imidazoline receptor. nischarin
ENSG00000197111 PCBP2 5094 The protein encoded by this gene appears to be multifunctional. Along with PCBP-1 and hnRNPK, it is one of the major cellular poly(rC)-binding proteins. The encoded protein contains three K-homologous (KH) domains which may be involved in RNA binding. Together with PCBP-1, this protein also functions as a translational coactivator of poliovirus RNA via a sequence-specific interaction with stem-loop IV of the IRES, promoting poliovirus RNA replication by binding to its 5’-terminal cloverleaf structure. It has also been implicated in translational control of the 15-lipoxygenase mRNA, human papillomavirus type 16 L2 mRNA, and hepatitis A virus RNA. The encoded protein is also suggested to play a part in formation of a sequence-specific alpha-globin mRNP complex which is associated with alpha-globin mRNA stability. This multiexon structural mRNA is thought to be retrotransposed to generate PCBP-1, an intronless gene with functions similar to that of PCBP2. This gene and PCBP-1 have paralogous genes (PCBP3 and PCBP4) which are thought to have arisen as a result of duplication events of entire genes. Thsi gene also has two processed pseudogenes (PCBP2P1 and PCBP2P2). Multiple transcript variants encoding different isoforms have been found for this gene. poly(rC) binding protein 2
ENSG00000233476 EEF1A1P6 ENSG00000233476 NA eukaryotic translation elongation factor 1 alpha 1 pseudogene 6
ENSG00000181222 POLR2A 5430 This gene encodes the largest subunit of RNA polymerase II, the polymerase responsible for synthesizing messenger RNA in eukaryotes. The product of this gene contains a carboxy terminal domain composed of heptapeptide repeats that are essential for polymerase activity. These repeats contain serine and threonine residues that are phosphorylated in actively transcribing RNA polymerase. In addition, this subunit, in combination with several other polymerase subunits, forms the DNA binding domain of the polymerase, a groove in which the DNA template is transcribed into RNA. polymerase (RNA) II subunit A
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10 keratin 10 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 ENSG00000167768 KRT1 keratin 1 NA
The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 ENSG00000197971 MBP myelin basic protein NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 ENSG00000172867 KRT2 keratin 2 NA
Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 ENSG00000122304 PRM2 protamine 2 NA
NA ENSG00000266844 ENSG00000266844 RP11-862L9.3 NA NA
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 ENSG00000171401 KRT13 keratin 13 NA
NA 64065 ENSG00000112378 PERP PERP, TP53 apoptosis effector NA
This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 93099 ENSG00000161249 DMKN dermokine NA
NA 5619 ENSG00000175646 PRM1 protamine 1 NA
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 ENSG00000175084 DES desmin NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3852 ENSG00000186081 KRT5 keratin 5 NA
This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. 51806 ENSG00000178372 CALML5 calmodulin like 5 NA
This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. 3861 ENSG00000186847 KRT14 keratin 14 NA
This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. 5166 ENSG00000004799 PDK4 pyruvate dehydrogenase kinase 4 NA
This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. 283131 ENSG00000245532 NEAT1 nuclear paraspeckle assembly transcript 1 (non-protein coding) NA
Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. 51533 ENSG00000010318 PHF7 PHD finger protein 7 NA
This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. 7168 ENSG00000140416 TPM1 tropomyosin 1 (alpha) NA
NA 58473 ENSG00000021300 PLEKHB1 pleckstrin homology domain containing B1 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 ENSG00000170477 KRT4 keratin 4 NA
This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 2670 ENSG00000131095 GFAP glial fibrillary acidic protein NA
This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. 4014 ENSG00000203782 LOR loricrin NA
NA 7178 ENSG00000133112 TPT1 tumor protein, translationally-controlled 1 NA
This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ENSG00000075624 ACTB actin, beta NA
The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. The cystatin locus on chromosome 20 contains the majority of the type 2 cystatin genes and pseudogenes. This gene is located in the cystatin locus and encodes the most abundant extracellular inhibitor of cysteine proteases, which is found in high concentrations in biological fluids and is expressed in virtually all organs of the body. A mutation in this gene has been associated with amyloid angiopathy. Expression of this protein in vascular wall smooth muscle cells is severely reduced in both atherosclerotic and aneurysmal aortic lesions, establishing its role in vascular disease. In addition, this protein has been shown to have an antimicrobial function, inhibiting the replication of herpes simplex virus. Alternative splicing results in multiple transcript variants encoding a single protein. 1471 ENSG00000101439 CST3 cystatin C NA
This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. 7314 ENSG00000170315 UBB ubiquitin B NA
This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. 388533 ENSG00000188508 KRTDAP keratinocyte differentiation associated protein NA
NA 222166 ENSG00000180354 MTURN maturin, neural progenitor differentiation regulator homolog (Xenopus) NA
This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 ENSG00000198467 TPM2 tropomyosin 2 (beta) NA
This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. 5317 ENSG00000081277 PKP1 plakophilin 1 NA
The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. 682 ENSG00000172270 BSG basigin (Ok blood group) NA
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 ENSG00000115414 FN1 fibronectin 1 NA
This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. 117159 ENSG00000161634 DCD dermcidin NA
This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. 2023 ENSG00000074800 ENO1 enolase 1 NA
This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. 65108 ENSG00000175130 MARCKSL1 MARCKS like 1 NA
NA 6707 ENSG00000163209 SPRR3 small proline rich protein 3 NA
This gene encodes a membrane bound protein with several transient phosphorylation sites and PEST motifs. Conservation of proteins with PEST sequences among different species supports their functional significance. PEST sequences typically occur in proteins with high turnover rates. Immunological characteristics of this protein are species specific. This protein also undergoes N-terminal myristoylation. Alternative splicing results in multiple transcript variants that encode the same protein. 10409 ENSG00000176788 BASP1 brain abundant membrane attached signal protein 1 NA
This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. 2879 ENSG00000167468 GPX4 glutathione peroxidase 4 NA
This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. 2778 ENSG00000087460 GNAS GNAS complex locus NA
Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. 3691 ENSG00000132470 ITGB4 integrin subunit beta 4 NA
This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 5660 ENSG00000197746 PSAP prosaposin NA
This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. 27122 ENSG00000050165 DKK3 dickkopf WNT signaling pathway inhibitor 3 NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P2. The P1 protein can interact with P0 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Two alternatively spliced transcript variants that encode different proteins have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6176 ENSG00000137818 RPLP1 ribosomal protein lateral stalk subunit P1 NA
This gene encodes a transcription factor that is a member of the nuclear receptor subfamily 1. The encoded protein is a ligand-sensitive transcription factor that negatively regulates the expression of core clock proteins. In particular this protein represses the circadian clock transcription factor aryl hydrocarbon receptor nuclear translocator-like protein 1 (ARNTL). This protein may also be involved in regulating genes that function in metabolic, inflammatory and cardiovascular processes. 9572 ENSG00000126368 NR1D1 nuclear receptor subfamily 1 group D member 1 NA
This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 ENSG00000108821 COL1A1 collagen type I alpha 1 NA
The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. 23650 ENSG00000137699 TRIM29 tripartite motif containing 29 NA
This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. 1675 ENSG00000197766 CFD complement factor D NA
This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1281 ENSG00000168542 COL3A1 collagen type III alpha 1 chain NA
The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. 348 ENSG00000130203 APOE apolipoprotein E NA
This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. 7018 ENSG00000091513 TF transferrin NA
The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. 7145 ENSG00000079308 TNS1 tensin 1 NA
NA 150094 ENSG00000142178 SIK1 salt inducible kinase 1 NA
NA ENSG00000229732 ENSG00000229732 AC019349.5 NA NA
The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. 2934 ENSG00000148180 GSN gelsolin NA
Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. 57699 ENSG00000124772 CPNE5 copine 5 NA
This gene encodes a conserved serine/threonine kinase that is a member of the homeodomain-interacting protein kinase family. The encoded protein interacts with homeodomain transcription factors and many other transcription factors such as p53, and can function as both a corepressor and a coactivator depending on the transcription factor and its subcellular localization. Multiple transcript variants encoding different isoforms have been found for this gene. 28996 ENSG00000064393 HIPK2 homeodomain interacting protein kinase 2 NA
NA 55076 ENSG00000181458 TMEM45A transmembrane protein 45A NA
The protein encoded by this gene is a zinc finger transcription factor and contains an N-terminal POZ domain. This protein acts as a sequence-specific repressor of transcription, and has been shown to modulate the transcription of STAT-dependent IL-4 responses of B cells. This protein can interact with a variety of POZ-containing proteins that function as transcription corepressors. This gene is found to be frequently translocated and hypermutated in diffuse large-cell lymphoma (DLCL), and may be involved in the pathogenesis of DLCL. Alternatively spliced transcript variants encoding different protein isoforms have been found for this gene. 604 ENSG00000113916 BCL6 B-cell CLL/lymphoma 6 NA
NA 171024 ENSG00000172403 SYNPO2 synaptopodin 2 NA
NA ENSG00000265401 ENSG00000265401 RP11-138I1.4 NA NA
NA 79957 ENSG00000160781 PAQR6 progestin and adipoQ receptor family member 6 NA
This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. 2495 ENSG00000167996 FTH1 ferritin heavy chain 1 NA
NA 151516 ENSG00000244617 ASPRV1 aspartic peptidase, retroviral-like 1 NA
The protein encoded by this gene is a member of the NR1 subfamily of nuclear hormone receptors. It can bind as a monomer or as a homodimer to hormone response elements upstream of several genes to enhance the expression of those genes. The encoded protein has been shown to interact with NM23-2, a nucleoside diphosphate kinase involved in organogenesis and differentiation, as well as with NM23-1, the product of a tumor metastasis suppressor candidate gene. Also, it has been shown to aid in the transcriptional regulation of some genes involved in circadian rhythm. Four transcript variants encoding different isoforms have been described for this gene. 6095 ENSG00000069667 RORA RAR related orphan receptor A NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6175 ENSG00000089157 RPLP0 ribosomal protein lateral stalk subunit P0 NA
This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 ENSG00000163359 COL6A3 collagen type VI alpha 3 chain NA
NA 79026 ENSG00000124942 AHNAK AHNAK nucleoprotein NA
NA NA ENSG00000117289 NA NA TRUE
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 ENSG00000163220 S100A9 S100 calcium binding protein A9 NA
This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 ENSG00000143536 CRNN cornulin NA
This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. 6440 ENSG00000168484 SFTPC surfactant protein C NA
The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. 125 ENSG00000196616 ADH1B alcohol dehydrogenase 1B (class I), beta polypeptide NA
This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. 3728 ENSG00000173801 JUP junction plakoglobin NA
The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. 11067 ENSG00000165507 C10orf10 chromosome 10 open reading frame 10 NA
This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Expression of this gene in C. elegans unc-76 mutants can restore to the mutants partial locomotion and axonal fasciculation, suggesting that it also functions in axonal outgrowth. The N-terminal half of the gene product is highly acidic. Alternatively spliced transcript variants encoding different isoforms of this protein have been described. 9638 ENSG00000149557 FEZ1 fasciculation and elongation protein zeta 1 NA
This gene encodes a member of the kelch-related family of actin-binding proteins. The encoded protein plays a role in the oxidative stress response as a regulator of the transcription factor Nrf2, and expression of this gene may play a role in malignant transformation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 8507 ENSG00000171617 ENC1 ectodermal-neural cortex 1 NA
This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and contains an F-box domain. This protein is highly expressed during muscle atrophy, whereas mice deficient in this gene were found to be resistant to atrophy. This protein is thus a potential drug target for the treatment of muscle atrophy. Alternative splicing results in multiple transcript variants encoding different isoforms. 114907 ENSG00000156804 FBXO32 F-box protein 32 NA
NA 81691 ENSG00000005189 LOC81691 exonuclease NEF-sp NA
The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). 3315 ENSG00000106211 HSPB1 heat shock protein family B (small) member 1 NA
This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. 146225 ENSG00000140932 CMTM2 CKLF like MARVEL transmembrane domain containing 2 NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in stimulation of Ca2+-dependent insulin release, stimulation of prolactin secretion, and exocytosis. Chromosomal rearrangements and altered expression of this gene have been implicated in melanoma. 6277 ENSG00000197956 S100A6 S100 calcium binding protein A6 NA
This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 ENSG00000164692 COL1A2 collagen type I alpha 2 chain NA
This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. 7316 ENSG00000150991 UBC ubiquitin C NA
This gene encodes a protein that associates with basic transcription factor 3 (BTF3) to form the nascent polypeptide-associated complex (NAC). This complex binds to nascent proteins that lack a signal peptide motif as they emerge from the ribosome, blocking interaction with the signal recognition particle (SRP) and preventing mistranslocation to the endoplasmic reticulum. This protein is an IgE autoantigen in atopic dermatitis patients. Alternative splicing results in multiple transcript variants, but the full length nature of some of these variants, including those encoding very large proteins, has not been determined. There are multiple pseudogenes of this gene on different chromosomes. 4666 ENSG00000196531 NACA nascent polypeptide-associated complex alpha subunit NA
This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. 7070 ENSG00000154096 THY1 Thy-1 cell surface antigen NA
NA 58476 ENSG00000078804 TP53INP2 tumor protein p53 inducible nuclear protein 2 NA
This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. 2261 ENSG00000068078 FGFR3 fibroblast growth factor receptor 3 NA
The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. 2355 ENSG00000075426 FOSL2 FOS like 2, AP-1 transcription factor subunit NA
This is a paternally expressed imprinted gene that is thought to have been derived from the Ty3/Gypsy family of retrotransposons. It contains two overlapping open reading frames, RF1 and RF2, and expresses two proteins: a shorter, gag-like protein (with a CCHC-type zinc finger domain) from RF1; and a longer, gag/pol-like fusion protein (with an additional aspartic protease motif) from RF1/RF2 by -1 translational frameshifting (-1 FS). While -1 FS has been observed in RNA viruses and transposons in both prokaryotes and eukaryotes, this gene represents the first example of -1 FS in a eukaryotic cellular gene. This gene is highly conserved across mammalian species and retains the heptanucleotide (GGGAAAC) and pseudoknot elements required for -1 FS. It is expressed in adult and embryonic tissues (most notably in placenta) and reported to have a role in cell proliferation, differentiation and apoptosis. Overexpression of this gene has been associated with several malignancies, such as hepatocellular carcinoma and B-cell lymphocytic leukemia. Knockout mice lacking this gene showed early embryonic lethality with placental defects, indicating the importance of this gene in embryonic development. Additional isoforms resulting from alternatively spliced transcript variants, and use of upstream non-AUG (CUG) start codon have been reported for this gene. 23089 ENSG00000242265 PEG10 paternally expressed 10 NA
Plectin is a prominent member of an important family of structurally and in part functionally related proteins, termed plakins or cytolinkers, that are capable of interlinking different elements of the cytoskeleton. Plakins, with their multi-domain structure and enormous size, not only play crucial roles in maintaining cell and tissue integrity and orchestrating dynamic changes in cytoarchitecture and cell shape, but also serve as scaffolding platforms for the assembly, positioning, and regulation of signaling complexes (reviewed in PMID: 9701547, 11854008, and 17499243). Plectin is expressed as several protein isoforms in a wide range of cell types and tissues from a single gene located on chromosome 8 in humans (PMID: 8633055, 8698233). Until 2010, this locus was named plectin 1 (symbol PLEC1 in human; Plec1 in mouse and rat) and the gene product had been referred to as ‘hemidesmosomal protein 1’ or ‘plectin 1, intermediate filament binding 500kDa’. These names were superseded by plectin. The plectin gene locus in mouse on chromosome 15 has been analyzed in detail (PMID: 10556294, 14559777), revealing a genomic exon-intron organization with well over 40 exons spanning over 62 kb and an unusual 5’ transcript complexity of plectin isoforms. Eleven exons (1-1j) have been identified that alternatively splice directly into a common exon 2 which is the first exon to encode plectin’s highly conserved actin binding domain (ABD). Three additional exons (-1, 0a, and 0) splice into an alternative first coding exon (1c), and two additional exons (2alpha and 3alpha) are optionally spliced within the exons encoding the acting binding domain (exons 2-8). Analysis of the human locus has identified eight of the eleven alternative 5’ exons found in mouse and rat (PMID: 14672974); exons 1i, 1j and 1h have not been confirmed in human. Furthermore, isoforms lacking the central rod domain encoded by exon 31 have been detected in mouse (PMID:10556294), rat (PMID: 9177781), and human (PMID: 11441066, 10780662, 20052759). The short alternative amino-terminal sequences encoded by the different first exons direct the targeting of the various isoforms to distinct subcellular locations (PMID: 14559777). As the expression of specific plectin isoforms was found to be dependent on cell type (tissue) and stage of development (PMID: 10556294, 12542521, 17389230) it appears that each cell type (tissue) contains a unique set (proportion and composition) of plectin isoforms, as if custom-made for specific requirements of the particular cells. Concordantly, individual isoforms were found to carry out distinct and specific functions (PMID: 14559777, 12542521, 18541706). In 1996, a number of groups reported that patients suffering from epidermolysis bullosa simplex with muscular dystrophy (EBS-MD) lacked plectin expression in skin and muscle tissues due to defects in the plectin gene (PMID: 8698233, 8941634, 8636409, 8894687, 8696340). Two other subtypes of plectin-related EBS have been described: EBS-pyloric atresia (PA) and EBS-Ogna. For reviews of plectin-related diseases see PMID: 15810881, 19945614. Mutations in the plectin gene related to human diseases should be named based on the position in NM_000445 (variant 1, isoform 1c), unless the mutation is located within one of the other alternative first exons, in which case the position in the respective Reference Sequence should be used. 5339 ENSG00000178209 PLEC plectin NA
The protein encoded by this gene is similar to the protein transgelin, which is one of the earliest markers of differentiated smooth muscle. The specific function of this protein has not yet been determined, although it is thought to be a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene. 8407 ENSG00000158710 TAGLN2 transgelin 2 NA
NA 27129 ENSG00000173641 HSPB7 heat shock protein family B (small) member 7 NA
LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). 80740 ENSG00000204421 LY6G6C lymphocyte antigen 6 complex, locus G6C NA
This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. 1832 ENSG00000096696 DSP desmoplakin NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853 ENSG00000205420 KRT6A keratin 6A NA
This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. 729238 ENSG00000185303 SFTPA2 surfactant protein A2 NA
This gene is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain. Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism, and behavior. This gene is upregulated by CLOCK/ARNTL heterodimers but then represses this upregulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/ARNTL. Polymorphisms in this gene may increase the risk of getting certain cancers. Alternative splicing has been observed in this gene; however, these variants have not been fully described. 5187 ENSG00000179094 PER1 period circadian clock 1 NA
Protein phosphatase 2A is one of the four major Ser/Thr phosphatases and is implicated in the negative control of cell growth and division. Protein phosphatase 2A holoenzymes are heterotrimeric proteins composed of a structural subunit A, a catalytic subunit C, and a regulatory subunit B. The regulatory subunit is encoded by a diverse set of genes that have been grouped into the B/PR55, B’/PR61, and B’‘/PR72 families. These different regulatory subunits confer distinct enzymatic specificities and intracellular localizations to the holozenzyme. The product of this gene belongs to the B’ family. This gene encodes a specific phosphotyrosyl phosphatase activator of the dimeric form of protein phosphatase 2A. Alternative splicing results in multiple transcript variants encoding different isoforms. 5524 ENSG00000119383 PTPA protein phosphatase 2 phosphatase activator NA
The protein encoded by this gene is a member of the ADAM (a disintegrin and metalloproteinase) protein family. ADAM family members are type I transmembrane glycoproteins known to be involved in cell adhesion and proteolytic ectodomain processing of cytokines and adhesion molecules. This protein contains multiple functional domains including a zinc-binding metalloprotease domain, a disintegrin-like domain, as well as a EGF-like domain. Through its disintegrin-like domain, this protein specifically interacts with the integrin beta chain, beta 3. It also interacts with Src family protein-tyrosine kinases in a phosphorylation-dependent manner, suggesting that this protein may function in cell-cell adhesion as well as in cellular signaling. Multiple alternatively spliced transcript variants encoding distinct isoforms have been observed. 8751 ENSG00000143537 ADAM15 ADAM metallopeptidase domain 15 NA
This gene encodes a member of the TSC22 domain family of leucine zipper transcription factors. The encoded protein is stimulated by transforming growth factor beta, and regulates the transcription of multiple genes including C-type natriuretic peptide. The encoded protein may play a critical role in tumor suppression through the induction of cancer cell apoptosis, and a single nucleotide polymorphism in the promoter of this gene has been associated with diabetic nephropathy. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 8848 ENSG00000102804 TSC22D1 TSC22 domain family member 1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

GTEx 2013 Factor analysis (sparse loadings: voom counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013/voom_gtex/voom_gtex_sfa_lambda.out");
f_out <- t(read.table("../sfa_outputs/GTEX2013/voom_gtex/voom_gtex_sfa_F.out"));

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;

indices_mat <- SFA.ExtractTopFeatures(f_out, top_features = 100,
                      options = "min",mult.annotate = TRUE)

gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

SFA loadings plot

samples_id <- read.table("../sfa_inputs/samples_id.txt");

tissue_labels <- vector("numeric", NROW(samples_id))
tissue_labels <- samples_id[ ,3]

tissue_levels <- unique(tissue_labels);


cumsum_val <- c(1,cumsum(as.numeric(table(tissue_labels))))
cumsum_low <- cumsum_val[1:(length(cumsum_val)-1)]
cumsum_high <- cumsum_val[2:(length(cumsum_val))];
cumsum_mean <- 0.5*(cumsum_low+cumsum_high)

for(k in 1:20){
png(paste0("../sfa_outputs/GTEX2013_transpose/sfa-figures/voom_sparse_load_loadings/gtex_sfa_loadings_",k,".png"), width=4, height=4, units="in", res=600)
par(mar=c(6,3,1,1))
par(mar=c(10,3,2,2))
barplot(lambda_out[,k], axisnames=F,space=0,border=NA,
        main=paste0("SFA on gtex expression: loading:", k),
        las=1, cex.axis=0.3,cex.main=0.4,
        ylim=c(min(lambda_out[,k]),max(lambda_out[,k])))
axis(1,at=cumsum_mean,unique(tissue_labels),las=2, cex.axis=0.3);
abline(v=cumsum_high)
dev.off()
}

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
MT1A 4489 ENSG00000205362 metallothionein 1A NA NA
ERRFI1 54206 ENSG00000116285 ERBB receptor feedback inhibitor 1 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). NA
MYC 4609 ENSG00000136997 v-myc avian myelocytomatosis viral oncogene homolog The protein encoded by this gene is a multifunctional, nuclear phosphoprotein that plays a role in cell cycle progression, apoptosis and cellular transformation. It functions as a transcription factor that regulates transcription of specific target genes. Mutations, overexpression, rearrangement and translocation of this gene have been associated with a variety of hematopoietic tumors, leukemias and lymphomas, including Burkitt lymphoma. There is evidence to show that alternative translation initiations from an upstream, in-frame non-AUG (CUG) and a downstream AUG start site result in the production of two isoforms with distinct N-termini. The synthesis of non-AUG initiated protein is suppressed in Burkitt’s lymphomas, suggesting its importance in the normal function of this gene. NA
CXCL2 2920 ENSG00000081041 C-X-C motif chemokine ligand 2 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. NA
SYCN 342898 ENSG00000179751 syncollin NA NA
NKD2 85409 ENSG00000145506 naked cuticle homolog 2 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
FOSB 2354 ENSG00000125740 FosB proto-oncogene, AP-1 transcription factor subunit The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
FOSL1 8061 ENSG00000175592 FOS like 1, AP-1 transcription factor subunit The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. NA
UAP1 6675 ENSG00000117143 UDP-N-acetylglucosamine pyrophosphorylase 1 NA NA
KLF10 7071 ENSG00000155090 Kruppel like factor 10 This gene encodes a member of a family of proteins that feature C2H2-type zinc finger domains. The encoded protein is a transcriptional repressor that acts as an effector of transforming growth factor beta signaling. Activity of this protein may inhibit the growth of cancers, particularly pancreatic cancer. Alternative splicing results in multiple transcript variants. NA
NA NA ENSG00000179294 NA NA TRUE
STC2 8614 ENSG00000113739 stanniocalcin 2 This gene encodes a secreted, homodimeric glycoprotein that is expressed in a wide variety of tissues and may have autocrine or paracrine functions. The encoded protein has 10 of its 15 cysteine residues conserved among stanniocalcin family members and is phosphorylated by casein kinase 2 exclusively on its serine residues. Its C-terminus contains a cluster of histidine residues which may interact with metal ions. The protein may play a role in the regulation of renal and intestinal calcium and phosphate transport, cell metabolism, or cellular calcium/phosphate homeostasis. Constitutive overexpression of human stanniocalcin 2 in mice resulted in pre- and postnatal growth restriction, reduced bone and skeletal muscle growth, and organomegaly. Expression of this gene is induced by estrogen and altered in some breast cancers. NA
NR4A3 8013 ENSG00000119508 nuclear receptor subfamily 4 group A member 3 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. NA
SDC4 6385 ENSG00000124145 syndecan 4 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan that functions as a receptor in intracellular signaling. The encoded protein is found as a homodimer and is a member of the syndecan proteoglycan family. This gene is found on chromosome 20, while a pseudogene has been found on chromosome 22. NA
CPA2 1358 ENSG00000158516 carboxypeptidase A2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. NA
SAMD4A 23034 ENSG00000020577 sterile alpha motif domain containing 4A Sterile alpha motifs (SAMs) in proteins such as SAMD4A are part of an RNA-binding domain that functions as a posttranscriptional regulator by binding to an RNA sequence motif known as the Smaug recognition element, which was named after the Drosophila Smaug protein (Baez and Boccaccio, 2005 [PubMed 16221671]). NA
SPSB1 80176 ENSG00000171621 splA/ryanodine receptor domain and SOCS box containing 1 NA NA
TNFRSF11B 4982 ENSG00000164761 tumor necrosis factor receptor superfamily member 11b The protein encoded by this gene is a member of the TNF-receptor superfamily. This protein is an osteoblast-secreted decoy receptor that functions as a negative regulator of bone resorption. This protein specifically binds to its ligand, osteoprotegerin ligand, both of which are key extracellular regulators of osteoclast development. Studies of the mouse counterpart also suggest that this protein and its ligand play a role in lymph-node organogenesis and vascular calcification. Alternatively spliced transcript variants of this gene have been reported, but their full length nature has not been determined. NA
ZC3H12A 80149 ENSG00000163874 zinc finger CCCH-type containing 12A ZC3H12A is an MCP1 (CCL2; MIM 158105)-induced protein that acts as a transcriptional activator and causes cell death of cardiomyocytes, possibly via induction of genes associated with apoptosis. NA
AQP9 366 ENSG00000103569 aquaporin 9 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. NA
FAM83G 644815 ENSG00000188522 family with sequence similarity 83 member G NA NA
ANKRD53 79998 ENSG00000144031 ankyrin repeat domain 53 NA NA
DDX21 9188 ENSG00000165732 DEAD-box helicase 21 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an antigen recognized by autoimmune antibodies from a patient with watermelon stomach disease. This protein unwinds double-stranded RNA, folds single-stranded RNA, and may play important roles in ribosomal RNA biogenesis, RNA editing, RNA transport, and general transcription. NA
YBX3 8531 ENSG00000060138 Y-box binding protein 3 NA NA
DPT 1805 ENSG00000143196 dermatopontin Dermatopontin is an extracellular matrix protein with possible functions in cell-matrix interactions and matrix assembly. The protein is found in various tissues and many of its tyrosine residues are sulphated. Dermatopontin is postulated to modify the behavior of TGF-beta through interaction with decorin. NA
CCDC86 79080 ENSG00000110104 coiled-coil domain containing 86 NA NA
RAMP1 10267 ENSG00000132329 receptor activity modifying protein 1 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
SYNM 23336 ENSG00000182253 synemin The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. NA
BAG2 9532 ENSG00000112208 BCL2 associated athanogene 2 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The predicted BAG2 protein contains 211 amino acids. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner. NA
RP11-804A23.2 ENSG00000255959 ENSG00000255959 NA NA NA
TUBG1 7283 ENSG00000131462 tubulin gamma 1 This gene encodes a member of the tubulin superfamily. The encoded protein localizes to the centrosome where it binds to microtubules as part of a complex referred to as the gamma-tubulin ring complex. The protein mediates microtubule nucleation and is required for microtubule formation and progression of the cell cycle. A pseudogene of this gene is found on chromosome 7. NA
HSPB7 27129 ENSG00000173641 heat shock protein family B (small) member 7 NA NA
ICAM1 3383 ENSG00000090339 intercellular adhesion molecule 1 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. NA
CES1 1066 ENSG00000198848 carboxylesterase 1 This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. This enzyme is the major liver enzyme and functions in liver drug clearance. Mutations of this gene cause carboxylesterase 1 deficiency. Three transcript variants encoding three different isoforms have been found for this gene. NA
RP11-316O14.1 ENSG00000268603 ENSG00000268603 NA NA NA
BCHE 590 ENSG00000114200 butyrylcholinesterase Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. NA
MT2A 4502 ENSG00000125148 metallothionein 2A NA NA
MT1M 4499 ENSG00000205364 metallothionein 1M This gene encodes a member of the metallothionein superfamily, type 1 family. Metallothioneins have a high content of cysteine residues that bind various heavy metals. These genes are transcriptionally regulated by both heavy metals and glucocorticoids. NA
ALDH1B1 219 ENSG00000137124 aldehyde dehydrogenase 1 family member B1 This protein belongs to the aldehyde dehydrogenases family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. This gene does not contain introns in the coding sequence. The variation of this locus may affect the development of alcohol-related problems. NA
NKX3-1 4824 ENSG00000167034 NK3 homeobox 1 This gene encodes a homeobox-containing transcription factor. This transcription factor functions as a negative regulator of epithelial cell growth in prostate tissue. Aberrant expression of this gene is associated with prostate tumor progression. Alternate splicing results in multiple transcript variants of this gene. NA
RP11-396F22.1 ENSG00000257718 ENSG00000257718 NA NA NA
CCL2 6347 ENSG00000108691 C-C motif chemokine ligand 2 This gene is one of several cytokine genes clustered on the q-arm of chromosome 17. Chemokines are a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of N-terminal cysteine residues of the mature peptide. This chemokine is a member of the CC subfamily which is characterized by two adjacent cysteine residues. This cytokine displays chemotactic activity for monocytes and basophils but not for neutrophils or eosinophils. It has been implicated in the pathogenesis of diseases characterized by monocytic infiltrates, like psoriasis, rheumatoid arthritis and atherosclerosis. It binds to chemokine receptors CCR2 and CCR4. NA
TEAD2 8463 ENSG00000074219 TEA domain transcription factor 2 NA NA
FAM83D 81610 ENSG00000101447 family with sequence similarity 83 member D NA NA
CDKN1A 1026 ENSG00000124762 cyclin-dependent kinase inhibitor 1A This gene encodes a potent cyclin-dependent kinase inhibitor. The encoded protein binds to and inhibits the activity of cyclin-cyclin-dependent kinase2 or -cyclin-dependent kinase4 complexes, and thus functions as a regulator of cell cycle progression at G1. The expression of this gene is tightly controlled by the tumor suppressor protein p53, through which this protein mediates the p53-dependent cell cycle G1 phase arrest in response to a variety of stress stimuli. This protein can interact with proliferating cell nuclear antigen, a DNA polymerase accessory factor, and plays a regulatory role in S phase DNA replication and DNA damage repair. This protein was reported to be specifically cleaved by CASP3-like caspases, which thus leads to a dramatic activation of cyclin-dependent kinase2, and may be instrumental in the execution of apoptosis following caspase activation. Mice that lack this gene have the ability to regenerate damaged or missing tissue. Multiple alternatively spliced variants have been found for this gene. NA
FILIP1L 11259 ENSG00000168386 filamin A interacting protein 1 like NA NA
ARG1 383 ENSG00000118520 arginase 1 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exist (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type I isoform encoded by this gene, is a cytosolic enzyme and expressed predominantly in the liver as a component of the urea cycle. Inherited deficiency of this enzyme results in argininemia, an autosomal recessive disorder characterized by hyperammonemia. Two transcript variants encoding different isoforms have been found for this gene. NA
MEDAG 84935 ENSG00000102802 mesenteric estrogen dependent adipogenesis NA NA
AMOTL1 154810 ENSG00000166025 angiomotin like 1 The protein encoded by this gene is a peripheral membrane protein that is a component of tight junctions or TJs. TJs form an apical junctional structure and act to control paracellular permeability and maintain cell polarity. This protein is related to angiomotin, an angiostatin binding protein that regulates endothelial cell migration and capillary formation. Two transcript variants encoding different isoforms have been found for this gene. NA
SRSF12 135295 ENSG00000154548 serine and arginine rich splicing factor 12 NA NA
ANO1-AS1 ENSG00000254902 ENSG00000254902 ANO1 antisense RNA 1 NA NA
PLIN5 440503 ENSG00000214456 perilipin 5 Members of the perilipin family, such as PLIN5, coat intracellular lipid storage droplets and protect them from lipolytic degradation (Dalen et al., 2007 [PubMed 17234449]). NA
RP11-973D8.4 ENSG00000258554 ENSG00000258554 NA NA NA
LDLR 3949 ENSG00000130164 low density lipoprotein receptor The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. Low density lipoprotein (LDL) is normally bound at the cell membrane and taken into the cell ending up in lysosomes where the protein is degraded and the cholesterol is made available for repression of microsomal enzyme 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase, the rate-limiting step in cholesterol synthesis. At the same time, a reciprocal stimulation of cholesterol ester synthesis takes place. Mutations in this gene cause the autosomal dominant disorder, familial hypercholesterolemia. Alternate splicing results in multiple transcript variants. NA
LARP1B 55132 ENSG00000138709 La ribonucleoprotein domain family member 1B This gene encodes a protein containing domains found in the La related protein of Drosophila melanogaster. La motif-containing proteins are thought to be RNA-binding proteins, where the La motif and adjacent amino acids fold into an RNA recognition motif. The La motif is also found in proteins unrelated to the La protein. Alternative splicing has been observed at this locus and multiple variants, encoding distinct isoforms, are described. Additional splice variation has been identified but the full-length nature of these transcripts has not been determined. NA
KLHL25 64410 ENSG00000183655 kelch like family member 25 NA NA
RP11-791G15.2 ENSG00000272275 ENSG00000272275 NA NA NA
IL15RA 3601 ENSG00000134470 interleukin 15 receptor subunit alpha This gene encodes a cytokine receptor that specifically binds interleukin 15 (IL15) with high affinity. The receptors of IL15 and IL2 share two subunits, IL2R beta and IL2R gamma. This forms the basis of many overlapping biological activities of IL15 and IL2. The protein encoded by this gene is structurally related to IL2R alpha, an additional IL2-specific alpha subunit necessary for high affinity IL2 binding. Unlike IL2RA, IL15RA is capable of binding IL15 with high affinity independent of other subunits, which suggests distinct roles between IL15 and IL2. This receptor is reported to enhance cell proliferation and expression of apoptosis inhibitor BCL2L1/BCL2-XL and BCL2. Multiple alternatively spliced transcript variants of this gene have been reported. NA
OPLAH 26873 ENSG00000178814 5-oxoprolinase (ATP-hydrolysing) The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). NA
DUX4L50 ENSG00000232815 ENSG00000232815 double homeobox 4 like 50, pseudogene NA NA
ARMC9 80210 ENSG00000135931 armadillo repeat containing 9 NA NA
RP11-253E3.3 ENSG00000250899 ENSG00000250899 NA NA NA
RP11-6O2.3 ENSG00000261616 ENSG00000261616 NA NA NA
BCL6 604 ENSG00000113916 B-cell CLL/lymphoma 6 The protein encoded by this gene is a zinc finger transcription factor and contains an N-terminal POZ domain. This protein acts as a sequence-specific repressor of transcription, and has been shown to modulate the transcription of STAT-dependent IL-4 responses of B cells. This protein can interact with a variety of POZ-containing proteins that function as transcription corepressors. This gene is found to be frequently translocated and hypermutated in diffuse large-cell lymphoma (DLCL), and may be involved in the pathogenesis of DLCL. Alternatively spliced transcript variants encoding different protein isoforms have been found for this gene. NA
RP11-554A11.9 ENSG00000259799 ENSG00000259799 NA NA NA
PPRC1 23082 ENSG00000148840 peroxisome proliferator-activated receptor gamma, coactivator-related 1 The protein encoded by this gene is similar to PPAR-gamma coactivator 1 (PPARGC1/PGC-1), a protein that can activate mitochondrial biogenesis in part through a direct interaction with nuclear respiratory factor 1 (NRF1). This protein has been shown to interact with NRF1. It is thought to be a functional relative of PPAR-gamma coactivator 1 that activates mitochondrial biogenesis through NRF1 in response to proliferative signals. Alternative splicing results in multiple transcript variants. NA
CLIC3 9022 ENSG00000169583 chloride intracellular channel 3 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 3 is a member of the p64 family and is predominantly localized in the nucleus and stimulates chloride ion channel activity. In addition, this protein may participate in cellular growth control, based on its association with ERK7, a member of the MAP kinase family. NA
CTD-2369P2.8 ENSG00000267607 ENSG00000267607 NA NA NA
CLCF1 23529 ENSG00000175505 cardiotrophin-like cytokine factor 1 This gene is a member of the glycoprotein (gp)130 cytokine family and encodes cardiotrophin-like cytokine factor 1 (CLCF1). CLCF1 forms a heterodimer complex with cytokine receptor-like factor 1 (CRLF1). This dimer competes with ciliary neurotrophic factor (CNTF) for binding to the ciliary neurotrophic factor receptor (CNTFR) complex, and activates the Jak-STAT signaling cascade. CLCF1 can be actively secreted from cells by forming a complex with soluble type I CRLF1 or soluble CNTFR. CLCF1 is a potent neurotrophic factor, B-cell stimulatory agent and neuroendocrine modulator of pituitary corticotroph function. Defects in CLCF1 cause cold-induced sweating syndrome 2 (CISS2). This syndrome is characterized by a profuse sweating after exposure to cold as well as congenital physical abnormalities of the head and spine. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
AEN 64782 ENSG00000181026 apoptosis enhancing nuclease NA NA
BTG2 7832 ENSG00000159388 BTG family member 2 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein is involved in the regulation of the G1/S transition of the cell cycle. NA
MT1X 4501 ENSG00000187193 metallothionein 1X NA NA
PKDCC 91461 ENSG00000162878 protein kinase domain containing, cytoplasmic NA NA
RP11-343H19.2 ENSG00000259827 ENSG00000259827 NA NA NA
IPO5 3843 ENSG00000065150 importin 5 Nucleocytoplasmic transport, a signal- and energy-dependent process, takes place through nuclear pore complexes embedded in the nuclear envelope. The import of proteins containing a nuclear localization signal (NLS) requires the NLS import receptor, a heterodimer of importin alpha and beta subunits also known as karyopherins. Importin alpha binds the NLS-containing cargo in the cytoplasm and importin beta docks the complex at the cytoplasmic side of the nuclear pore complex. In the presence of nucleoside triphosphates and the small GTP binding protein Ran, the complex moves into the nuclear pore complex and the importin subunits dissociate. Importin alpha enters the nucleoplasm with its passenger protein and importin beta remains at the pore. Interactions between importin beta and the FG repeats of nucleoporins are essential in translocation through the pore complex. The protein encoded by this gene is a member of the importin beta family. NA
CDK2 1017 ENSG00000123374 cyclin-dependent kinase 2 This gene encodes a member of a family of serine/threonine protein kinases that participate in cell cycle regulation. The encoded protein is the catalytic subunit of the cyclin-dependent protein kinase complex, which regulates progression through the cell cycle. Activity of this protein is especially critical during the G1 to S phase transition. This protein associates with and regulated by other subunits of the complex including cyclin A or E, CDK inhibitor p21Cip1 (CDKN1A), and p27Kip1 (CDKN1B). Alternative splicing results in multiple transcript variants. NA
PLA2G1B 5319 ENSG00000170890 phospholipase A2 group IB This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. NA
CACNB2 783 ENSG00000165995 calcium voltage-gated channel auxiliary subunit beta 2 This gene encodes a subunit of a voltage-dependent calcium channel protein that is a member of the voltage-gated calcium channel superfamily. The gene product was originally identified as an antigen target in Lambert-Eaton myasthenic syndrome, an autoimmune disorder. Mutations in this gene are associated with Brugada syndrome. Alternatively spliced variants encoding different isoforms have been described. NA
GMPSP1 ENSG00000250471 ENSG00000250471 guanine monophosphate synthase pseudogene 1 NA NA
PABPC4 8761 ENSG00000090621 poly(A) binding protein cytoplasmic 4 Poly(A)-binding proteins (PABPs) bind to the poly(A) tail present at the 3-prime ends of most eukaryotic mRNAs. PABPC4 or IPABP (inducible PABP) was isolated as an activation-induced T-cell mRNA encoding a protein. Activation of T cells increased PABPC4 mRNA levels in T cells approximately 5-fold. PABPC4 contains 4 RNA-binding domains and proline-rich C terminus. PABPC4 is localized primarily to the cytoplasm. It is suggested that PABPC4 might be necessary for regulation of stability of labile mRNA species in activated T cells. PABPC4 was also identified as an antigen, APP1 (activated-platelet protein-1), expressed on thrombin-activated rabbit platelets. PABPC4 may also be involved in the regulation of protein translation in platelets and megakaryocytes or may participate in the binding or stabilization of polyadenylates in platelet dense granules. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
SGMS2 166929 ENSG00000164023 sphingomyelin synthase 2 Sphingomyelin, a major component of cell and Golgi membranes, is made by the transfer of phosphocholine from phosphatidylcholine onto ceramide, with diacylglycerol as a side product. The protein encoded by this gene is an enzyme that catalyzes this reaction primarily at the cell membrane. The synthesis is reversible, and this enzyme can catalyze the reaction in either direction. The encoded protein is required for cell growth. Three transcript variants encoding the same protein have been found for this gene. There is evidence for more variants, but the full-length nature of their transcripts has not been determined. NA
SLCO4A1 28231 ENSG00000101187 solute carrier organic anion transporter family member 4A1 NA NA
CTXN1 404217 ENSG00000178531 cortexin 1 NA NA
SOD3 6649 ENSG00000109610 superoxide dismutase 3, extracellular This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. NA
CIART 148523 ENSG00000159208 circadian associated repressor of transcription NA NA
AC017104.6 ENSG00000224376 ENSG00000224376 NA NA NA
CHST3 9469 ENSG00000122863 carbohydrate sulfotransferase 3 This gene encodes an enzyme which catalyzes the sulfation of chondroitin, a proteoglycan found in the extracellular matrix and most cells which is involved in cell migration and differentiation. Mutations in this gene are associated with spondylepiphyseal dysplasia and humerospinal dysostosis. NA
TMEM45A 55076 ENSG00000181458 transmembrane protein 45A NA NA
IL18R1 8809 ENSG00000115604 interleukin 18 receptor 1 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This receptor specifically binds interleukin 18 (IL18), and is essential for IL18 mediated signal transduction. IFN-alpha and IL12 are reported to induce the expression of this receptor in NK and T cells. This gene along with four other members of the interleukin 1 receptor family, including IL1R2, IL1R1, ILRL2 (IL-1Rrp2), and IL1RL1 (T1/ST2), form a gene cluster on chromosome 2q. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
TUBB6 84617 ENSG00000176014 tubulin beta 6 class V NA NA
PIM1 5292 ENSG00000137193 Pim-1 proto-oncogene, serine/threonine kinase The protein encoded by this gene belongs to the Ser/Thr protein kinase family, and PIM subfamily. This gene is expressed primarily in B-lymphoid and myeloid cell lines, and is overexpressed in hematopoietic malignancies and in prostate cancer. It plays a role in signal transduction in blood cells, contributing to both cell proliferation and survival, and thus provides a selective advantage in tumorigenesis. Both the human and orthologous mouse genes have been reported to encode two isoforms (with preferential cellular localization) resulting from the use of alternative in-frame translation initiation codons, the upstream non-AUG (CUG) and downstream AUG codons (PMIDs:16186805, 1825810). NA
SLC19A2 10560 ENSG00000117479 solute carrier family 19 member 2 This gene encodes the thiamin transporter protein. Mutations in this gene cause thiamin-responsive megaloblastic anemia syndrome (TRMA), which is an autosomal recessive disorder characterized by diabetes mellitus, megaloblastic anemia and sensorineural deafness. Two transcript variants encoding different isoforms have been found for this gene. NA
TCONS_00029157 101928399 ENSG00000237989 uncharacterized LOC101928399 NA NA
UTP4 84916 ENSG00000141076 UTP4, small subunit processome component This gene encodes a WD40-repeat-containing protein that is localized to the nucleolus. Mutation of this gene causes North American Indian childhood cirrhosis, a severe intrahepatic cholestasis that results in transient neonatal jaundice, and progresses to periportal fibrosis and cirrhosis in childhood and adolescence. Alternative splicing results in multiple transcript variants. NA
PANX1 24145 ENSG00000110218 pannexin 1 The protein encoded by this gene belongs to the innexin family. Innexin family members are the structural components of gap junctions. This protein and pannexin 2 are abundantly expressed in central nerve system (CNS) and are coexpressed in various neuronal populations. Studies in Xenopus oocytes suggest that this protein alone and in combination with pannexin 2 may form cell type-specific gap junctions with distinct properties. NA
ODC1 4953 ENSG00000115758 ornithine decarboxylase 1 This gene encodes the rate-limiting enzyme of the polyamine biosynthesis pathway which catalyzes ornithine to putrescine. The activity level for the enzyme varies in response to growth-promoting stimuli and exhibits a high turnover rate in comparison to other mammalian proteins. Originally localized to both chromosomes 2 and 7, the gene encoding this enzyme has been determined to be located on 2p25, with a pseudogene located on 7q31-qter. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified. NA
TNFRSF10D 8793 ENSG00000173530 tumor necrosis factor receptor superfamily member 10d The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor contains an extracellular TRAIL-binding domain, a transmembrane domain, and a truncated cytoplamic death domain. This receptor does not induce apoptosis, and has been shown to play an inhibitory role in TRAIL-induced cell apoptosis. NA
COL15A1 1306 ENSG00000204291 collagen type XV alpha 1 chain This gene encodes the alpha chain of type XV collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Type XV collagen has a wide tissue distribution but the strongest expression is localized to basement membrane zones so it may function to adhere basement membranes to underlying connective tissue stroma. The proteolytically produced C-terminal fragment of type XV collagen is restin, a potentially antiangiogenic protein that is closely related to endostatin. Mouse studies have shown that collagen XV deficiency is associated with muscle and microvessel deterioration. NA
RBPMS 11030 ENSG00000157110 RNA binding protein with multiple splicing This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
PTP4A1 7803 ENSG00000112245 protein tyrosine phosphatase type IVA, member 1 This gene encodes a member of a small class of prenylated protein tyrosine phosphatases (PTPs), which contain a PTP domain and a characteristic C-terminal prenylation motif. The encoded protein is a cell signaling molecule that plays regulatory roles in a variety of cellular processes, including cell proliferation and migration. The protein may also be involved in cancer development and metastasis. This tyrosine phosphatase is a nuclear protein, but may associate with plasma membrane by means of its prenylation motif. Pseudogenes related to this gene are located on chromosomes 1, 2, 5, 7, 11 and X. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol notfound
colipase The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208 ENSG00000137392 CLPS NA
chymotrypsin like elastase family member 2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 63036 ENSG00000142615 CELA2A NA
regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5968 ENSG00000172023 REG1B NA
chymotrypsin C This gene encodes a member of the peptidase S1 family. The encoded protein is a serum calcium-decreasing factor that has chymotrypsin-like protease activity. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. 11330 ENSG00000162438 CTRC NA
syncollin NA 342898 ENSG00000179751 SYCN NA
chymotrypsinogen B2 NA 440387 ENSG00000168928 CTRB2 NA
pancreatic lipase related protein 1 NA 5407 ENSG00000187021 PNLIPRP1 NA
chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. 23436 ENSG00000219073 CELA3B NA
pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 ENSG00000175535 PNLIP NA
NA NA NA ENSG00000250606 NA TRUE
NA NA ENSG00000240338 ENSG00000240338 RP11-331F4.4 NA
chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 ENSG00000142789 CELA3A NA
NA NA NA ENSG00000165862 NA TRUE
carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 ENSG00000153002 CPB1 NA
chymotrypsinogen B1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. 1504 ENSG00000168925 CTRB1 NA
regenerating family member 3 alpha This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. 5068 ENSG00000172016 REG3A NA
amylase, alpha 2A (pancreatic) This gene encodes a member of the alpha-amylase family of proteins. Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the first step in digestion of dietary starch and glycogen. This gene and several family members are present in a gene cluster on chromosome 1. This gene encodes an amylase isoenzyme produced by the pancreas. 279 ENSG00000243480 AMY2A NA
carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 ENSG00000091704 CPA1 NA
protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 ENSG00000204983 PRSS1 NA
chymotrypsin like elastase family member 2B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2B is secreted from the pancreas as a zymogen. In other species, elastase 2B has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 51032 ENSG00000215704 CELA2B NA
glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 ENSG00000169347 GP2 NA
phospholipase A2 group IB This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. 5319 ENSG00000170890 PLA2G1B NA
alpha-1-microglobulin/bikunin precursor This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. 259 ENSG00000106927 AMBP NA
CD44 molecule (Indian blood group) The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. 960 ENSG00000026508 CD44 NA
metallothionein 1G NA 4495 ENSG00000125144 MT1G NA
ADIRF antisense RNA 1 NA ENSG00000272734 ENSG00000272734 ADIRF-AS1 NA
growth differentiation factor 15 The protein encoded by this gene belongs to the transforming growth factor-beta (TGF-beta) family. The protein is expressed in a broad range of cell types, acts as a pleiotropic cytokine and is involved in the stress reponse program of cells after cellular injury. Increased protein levels are associated with disease states such as tissue hypoxia, inflammation, acute injury and oxidative stress. 9518 ENSG00000130513 GDF15 NA
carboxypeptidase A2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. 1358 ENSG00000158516 CPA2 NA
NA NA ENSG00000255443 ENSG00000255443 RP1-68D18.4 NA
Rho guanine nucleotide exchange factor 28 This gene encodes a member of the Rho guanine nucleotide exchange factor family. The encoded protein interacts with low molecular weight neurofilament mRNA and may be involved in the formation of amyotrophic lateral sclerosis neurofilament aggregates. Alternate splicing results in multiple transcript variants. 64283 ENSG00000214944 ARHGEF28 NA
transmembrane protein 52 NA 339456 ENSG00000178821 TMEM52 NA
NA NA ENSG00000266844 ENSG00000266844 RP11-862L9.3 NA
family with sequence similarity 174 member B NA 400451 ENSG00000185442 FAM174B NA
albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 ENSG00000163631 ALB NA
GLIS family zinc finger 3 This gene is a member of the GLI-similar zinc finger protein family and encodes a nuclear protein with five C2H2-type zinc finger domains. This protein functions as both a repressor and activator of transcription and is specifically involved in the development of pancreatic beta cells, the thyroid, eye, liver and kidney. Mutations in this gene have been associated with neonatal diabetes and congenital hypothyroidism (NDH). Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only two have been determined. 169792 ENSG00000107249 GLIS3 NA
delta like non-canonical Notch ligand 1 This gene encodes a transmembrane protein that contains multiple epidermal growth factor repeats that functions as a regulator of cell growth. The encoded protein is involved in the differentiation of several cell types including adipocytes. This gene is located in a region of chromosome 14 frequently showing unparental disomy, and is imprinted and expressed from the paternal allele. A single nucleotide variant in this gene is associated with child and adolescent obesity and shows polar overdominance, where heterozygotes carrying an active paternal allele express the phenotype, while mutant homozygotes are normal. 8788 ENSG00000185559 DLK1 NA
platelet derived growth factor D The protein encoded by this gene is a member of the platelet-derived growth factor family. The four members of this family are mitogenic factors for cells of mesenchymal origin and are characterized by a core motif of eight cysteines, seven of which are found in this factor. This gene product only forms homodimers and, therefore, does not dimerize with the other three family members. It differs from alpha and beta members of this family in having an unusual N-terminal domain, the CUB domain. Two splice variants have been identified for this gene. 80310 ENSG00000170962 PDGFD NA
X-box binding protein 1 This gene encodes a transcription factor that regulates MHC class II genes by binding to a promoter element referred to as an X box. This gene product is a bZIP protein, which was also identified as a cellular transcription factor that binds to an enhancer in the promoter of the T cell leukemia virus type 1 promoter. It may increase expression of viral proteins by acting as the DNA binding partner of a viral transactivator. It has been found that upon accumulation of unfolded proteins in the endoplasmic reticulum (ER), the mRNA of this gene is processed to an active form by an unconventional splicing mechanism that is mediated by the endonuclease inositol-requiring enzyme 1 (IRE1). The resulting loss of 26 nt from the spliced mRNA causes a frame-shift and an isoform XBP1(S), which is the functionally active transcription factor. The isoform encoded by the unspliced mRNA, XBP1(U), is constitutively expressed, and thought to function as a negative feedback regulator of XBP1(S), which shuts off transcription of target genes during the recovery phase of ER stress. A pseudogene of XBP1 has been identified and localized to chromosome 5. 7494 ENSG00000100219 XBP1 NA
uncharacterized LOC100506314 NA 100506314 ENSG00000247498 LOC100506314 NA
nuclear protein 1, transcriptional regulator NA 26471 ENSG00000176046 NUPR1 NA
arginase 2 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exists (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type II isoform encoded by this gene, is located in the mitochondria and expressed in extra-hepatic tissues, especially kidney. The physiologic role of this isoform is poorly understood; it is thought to play a role in nitric oxide and polyamine metabolism. Transcript variants of the type II gene resulting from the use of alternative polyadenylation sites have been described. 384 ENSG00000081181 ARG2 NA
small nucleolar RNA host gene 25 NA ENSG00000266402 ENSG00000266402 SNHG25 NA
cytochrome P450 family 27 subfamily B member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The protein encoded by this gene localizes to the inner mitochondrial membrane where it hydroxylates 25-hydroxyvitamin D3 at the 1alpha position. This reaction synthesizes 1alpha,25-dihydroxyvitamin D3, the active form of vitamin D3, which binds to the vitamin D receptor and regulates calcium metabolism. Thus this enzyme regulates the level of biologically active vitamin D and plays an important role in calcium homeostasis. Mutations in this gene can result in vitamin D-dependent rickets type I. 1594 ENSG00000111012 CYP27B1 NA
eukaryotic translation initiation factor 4E binding protein 1 This gene encodes one member of a family of translation repressor proteins. The protein directly interacts with eukaryotic translation initiation factor 4E (eIF4E), which is a limiting component of the multisubunit complex that recruits 40S ribosomal subunits to the 5’ end of mRNAs. Interaction of this protein with eIF4E inhibits complex assembly and represses translation. This protein is phosphorylated in response to various signals including UV irradiation and insulin signaling, resulting in its dissociation from eIF4E and activation of mRNA translation. 1978 ENSG00000187840 EIF4EBP1 NA
transthyretin This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. 7276 ENSG00000118271 TTR NA
transmembrane channel like 4 NA 147798 ENSG00000167608 TMC4 NA
regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 ENSG00000115386 REG1A NA
syndecan 1 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. 6382 ENSG00000115884 SDC1 NA
phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 NA ENSG00000259657 ENSG00000259657 PIGHP1 NA
glycine N-methyltransferase The protein encoded by this gene is an enzyme that catalyzes the conversion of S-adenosyl-L-methionine (along with glycine) to S-adenosyl-L-homocysteine and sarcosine. This protein is found in the cytoplasm and acts as a homotetramer. Defects in this gene are a cause of GNMT deficiency (hypermethioninemia). Alternative splicing results in multiple transcript variants. Naturally occurring readthrough transcription occurs between the upstream CNPY3 (canopy FGF signaling regulator 3) gene and this gene and is represented with GeneID:107080644. 27232 ENSG00000124713 GNMT NA
tumor-associated calcium signal transducer 2 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. 4070 ENSG00000184292 TACSTD2 NA
NA NA ENSG00000234981 ENSG00000234981 RP11-534L20.4 NA
NA NA ENSG00000235795 ENSG00000235795 RP11-421L21.2 NA
NA NA ENSG00000228444 ENSG00000228444 RP11-173B14.4 NA
hematopoietically expressed homeobox This gene encodes a member of the homeobox family of transcription factors, many of which are involved in developmental processes. Expression in specific hematopoietic lineages suggests that this protein may play a role in hematopoietic differentiation. 3087 ENSG00000152804 HHEX NA
HIG1 hypoxia inducible domain family member 1B This gene encodes a member of the hypoxia inducible gene 1 (HIG1) domain family. The encoded protein is localized to the cell membrane and has been linked to tumorigenesis and the progression of pituitary adenomas. Alternative splicing results in multiple transcript variants. 51751 ENSG00000131097 HIGD1B NA
galactosidase beta 1 like 2 NA 89944 ENSG00000149328 GLB1L2 NA
serine peptidase inhibitor, Kazal type 1 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. 6690 ENSG00000164266 SPINK1 NA
nucleophosmin/nucleoplasmin 2 NA 10361 ENSG00000158806 NPM2 NA
tissue factor pathway inhibitor 2 This gene encodes a member of the Kunitz-type serine proteinase inhibitor family. The protein can inhibit a variety of serine proteases including factor VIIa/tissue factor, factor Xa, plasmin, trypsin, chymotryspin and plasma kallikrein. This gene has been identified as a tumor suppressor gene in several types of cancer. Alternative splicing results in multiple transcript variants. 7980 ENSG00000105825 TFPI2 NA
tumor necrosis factor receptor superfamily member 12A NA 51330 ENSG00000006327 TNFRSF12A NA
solute carrier family 39 member 11 NA 201266 ENSG00000133195 SLC39A11 NA
eukaryotic translation elongation factor 1 alpha 1 pseudogene 9 NA ENSG00000249264 ENSG00000249264 EEF1A1P9 NA
vesicle associated membrane protein 8 This gene encodes an integral membrane protein that belongs to the synaptobrevin/vesicle-associated membrane protein subfamily of soluble N-ethylmaleimide-sensitive factor attachment protein receptors (SNAREs). The encoded protein is involved in the fusion of synaptic vesicles with the presynaptic membrane. 8673 ENSG00000118640 VAMP8 NA
TRAF3 interacting protein 2 This gene encodes a protein involved in regulating responses to cytokines by members of the Rel/NF-kappaB transcription factor family. These factors play a central role in innate immunity in response to pathogens, inflammatory signals and stress. This gene product interacts with TRAF proteins (tumor necrosis factor receptor-associated factors) and either I-kappaB kinase or MAP kinase to activate either NF-kappaB or Jun kinase. Several alternative transcripts encoding different isoforms have been identified. Another transcript, which does not encode a protein and is transcribed in the opposite orientation, has been identified. Overexpression of this transcript has been shown to reduce expression of at least one of the protein encoding transcripts, suggesting it has a regulatory role in the expression of this gene. 10758 ENSG00000056972 TRAF3IP2 NA
zinc finger protein 215 NA 7762 ENSG00000149054 ZNF215 NA
thrombospondin 4 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. 7060 ENSG00000113296 THBS4 NA
apoptosis enhancing nuclease NA 64782 ENSG00000181026 AEN NA
RAB11 family interacting protein 1 This gene encodes one of the Rab11-family interacting proteins (Rab11-FIPs), which play a role in the Rab-11 mediated recycling of vesicles. The encoded protein may be involved in endocytic sorting, trafficking of proteins including integrin subunits and epidermal growth factor receptor (EGFR), and transport between the recycling endosome and the trans-Golgi network. Alternative splicing results in multiple transcript variants. A pseudogene is described on the X chromosome. 80223 ENSG00000156675 RAB11FIP1 NA
solute carrier family 39 member 14 Zinc is an essential cofactor for hundreds of enzymes. It is involved in protein, nucleic acid, carbohydrate, and lipid metabolism, as well as in the control of gene transcription, growth, development, and differentiation. SLC39A14 belongs to a subfamily of proteins that show structural characteristics of zinc transporters (Taylor and Nicholson, 2003 [PubMed 12659941]). 23516 ENSG00000104635 SLC39A14 NA
T-cell immune regulator 1, ATPase H+ transporting V0 subunit a3 Through alternate splicing, this gene encodes two proteins with similarity to subunits of the vacuolar ATPase (V-ATPase) but the encoded proteins seem to have different functions. V-ATPase is a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, and receptor-mediated endocytosis. V-ATPase is comprised of a cytosolic V1 domain and a transmembrane V0 domain. Mutations in this gene are associated with infantile malignant osteopetrosis. 10312 ENSG00000110719 TCIRG1 NA
NA NA NA ENSG00000225410 NA TRUE
G0/G1 switch 2 NA 50486 ENSG00000123689 G0S2 NA
methyltransferase like 1 This gene is similar in sequence to the S. cerevisiae YDL201w gene. The gene product contains a conserved S-adenosylmethionine-binding motif and is inactivated by phosphorylation. Alternative splice variants encoding different protein isoforms have been described for this gene. A pseudogene has been identified on chromosome X. 4234 ENSG00000037897 METTL1 NA
KIAA0922 NA 23240 ENSG00000121210 KIAA0922 NA
actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 ENSG00000163017 ACTG2 NA
NA NA ENSG00000272502 ENSG00000272502 RP11-713M15.2 NA
tripartite motif containing 5 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein forms homo-oligomers via the coilel-coil region and localizes to cytoplasmic bodies. It appears to function as a E3 ubiquitin-ligase and ubiqutinates itself to regulate its subcellular localization. It may play a role in retroviral restriction. Multiple alternatively spliced transcript variants encoding different isoforms have been described for this gene. 85363 ENSG00000132256 TRIM5 NA
MATN1 antisense RNA 1 NA 100129196 ENSG00000186056 MATN1-AS1 NA
alanyl aminopeptidase, membrane Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. 290 ENSG00000166825 ANPEP NA
nuclear paraspeckle assembly transcript 1 (non-protein coding) This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. 283131 ENSG00000245532 NEAT1 NA
NA NA ENSG00000266680 ENSG00000266680 RP5-1148A21.3 NA
family with sequence similarity 134 member B The protein encoded by this gene is a cis-Golgi transmembrane protein that may be necessary for the long-term survival of nociceptive and autonomic ganglion neurons. Mutations in this gene are a cause of hereditary sensory and autonomic neuropathy type IIB (HSAN IIB), and this gene may also play a role in susceptibility to vascular dementia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 54463 ENSG00000154153 FAM134B NA
chromosome 12 open reading frame 45 NA 121053 ENSG00000151131 C12orf45 NA
small Cajal body-specific RNA 2 NA 677766 ENSG00000270066 SCARNA2 NA
FARP1 antisense RNA 1 NA ENSG00000231194 ENSG00000231194 FARP1-AS1 NA
tumor necrosis factor receptor superfamily member 19 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. 55504 ENSG00000127863 TNFRSF19 NA
ERBB receptor feedback inhibitor 1 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). 54206 ENSG00000116285 ERRFI1 NA
membrane palmitoylated protein 6 Members of the peripheral membrane-associated guanylate kinase (MAGUK) family function in tumor suppression and receptor clustering by forming multiprotein complexes containing distinct sets of transmembrane, cytoskeletal, and cytoplasmic signaling proteins. All MAGUKs contain a PDZ-SH3-GUK core and are divided into 4 subfamilies, DLG-like (see DLG1; MIM 601014), ZO1-like (see TJP1; MIM 601009), p55-like (see MPP1; MIM 305360), and LIN2-like (see CASK; MIM 300172), based on their size and the presence of additional domains. MPP6 is a member of the p55-like MAGUK subfamily (Tseng et al., 2001 [PubMed 11311936]). 51678 ENSG00000105926 MPP6 NA
cell division cycle 20 pseudogene 1 NA ENSG00000231007 ENSG00000231007 CDC20P1 NA
EMI domain containing 1 NA 129080 ENSG00000186998 EMID1 NA
adhesion G protein-coupled receptor G1 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. 9289 ENSG00000205336 ADGRG1 NA
methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2, methenyltetrahydrofolate cyclohydrolase This gene encodes a nuclear-encoded mitochondrial bifunctional enzyme with methylenetetrahydrofolate dehydrogenase and methenyltetrahydrofolate cyclohydrolase activities. The enzyme functions as a homodimer and is unique in its absolute requirement for magnesium and inorganic phosphate. Formation of the enzyme-magnesium complex allows binding of NAD. Alternative splicing results in two different transcripts, one protein-coding and the other not protein-coding. This gene has a pseudogene on chromosome 7. 10797 ENSG00000065911 MTHFD2 NA
fibroblast growth factor 18 The protein encoded by this gene is a member of the fibroblast growth factor (FGF) family. FGF family members possess broad mitogenic and cell survival activities, and are involved in a variety of biological processes, including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth, and invasion. It has been shown in vitro that this protein is able to induce neurite outgrowth in PC12 cells. Studies of the similar proteins in mouse and chick suggested that this protein is a pleiotropic growth factor that stimulates proliferation in a number of tissues, most notably the liver and small intestine. Knockout studies of the similar gene in mice implied the role of this protein in regulating proliferation and differentiation of midline cerebellar structures. 8817 ENSG00000156427 FGF18 NA
zinc finger protein 321, pseudogene NA 399669 ENSG00000213801 ZNF321P NA
caspase 4 This gene encodes a protein that is a member of the cysteine-aspartic acid protease (caspase) family. Sequential activation of caspases plays a central role in the execution-phase of cell apoptosis. Caspases exist as inactive proenzymes composed of a prodomain and a large and small protease subunit. Activation of caspases requires proteolytic processing at conserved internal aspartic residues to generate a heterodimeric enzyme consisting of the large and small subunits. This caspase is able to cleave and activate its own precursor protein, as well as caspase 1 precursor. When overexpressed, this gene induces cell apoptosis. Alternative splicing results in transcript variants encoding distinct isoforms. 837 ENSG00000196954 CASP4 NA
NA NA ENSG00000197813 ENSG00000197813 CTC-301O7.4 NA
capping actin protein, gelsolin like This gene encodes a member of the gelsolin/villin family of actin-regulatory proteins. The encoded protein reversibly blocks the barbed ends of F-actin filaments in a Ca2+ and phosphoinositide-regulated manner, but does not sever preformed actin filaments. By capping the barbed ends of actin filaments, the encoded protein contributes to the control of actin-based motility in non-muscle cells. Alternatively spliced transcript variants have been observed for this gene. 822 ENSG00000042493 CAPG NA
target of myb1 like 1 membrane trafficking protein NA 10040 ENSG00000141198 TOM1L1 NA
transmembrane BAX inhibitor motif containing 1 NA 64114 ENSG00000135926 TMBIM1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
summary X_id query symbol name
The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. 1846 ENSG00000120875 DUSP4 dual specificity phosphatase 4
This gene encodes a major histocompatibility complex (MHC) class I-related molecule that binds to the NKG2D receptor on natural killer (NK) cells to trigger release of multiple cytokines and chemokines that in turn contribute to the recruitment and activation of NK cells. The encoded protein undergoes further processing to generate the mature protein that is either anchored to membrane via a glycosylphosphatidylinositol moiety, or secreted. Many malignant cells secrete the encoded protein to evade immunosurveillance by NK cells. This gene is located in a cluster of multiple MHC class I-related genes on chromosome 6. 80328 ENSG00000131015 ULBP2 UL16 binding protein 2
NA 401491 ENSG00000236404 VLDLR-AS1 VLDLR antisense RNA 1
This gene encodes a member of the homer family of dendritic proteins. Members of this family regulate group 1 metabotrophic glutamate receptor function. The encoded protein is a postsynaptic density scaffolding protein. Alternative splicing results in multiple transcript variants. Two related pseudogenes have been identified on chromosome 14. 9455 ENSG00000103942 HOMER2 homer scaffolding protein 2
This gene encodes a type-I integral membrane glycoprotein with diverse distribution in human tissues. The physiological function of this protein may be related to its mucin-type character. The homologous protein in other species has been described as a differentiation antigen and influenza-virus receptor. The specific function of this protein has not been determined but it has been proposed as a marker of lung injury. Alternatively spliced transcript variants encoding different isoforms have been identified. 10630 ENSG00000162493 PDPN podoplanin
This gene encodes a mitochondrial alanine transaminase, a pyridoxal enzyme that catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate. Alanine transaminases play roles in gluconeogenesis and amino acid metabolism in many tissues including skeletal muscle, kidney, and liver. Activating transcription factor 4 upregulates this gene under metabolic stress conditions in hepatocyte cell lines. A loss of function mutation in this gene has been associated with developmental encephalopathy. Alternative splicing results in multiple transcript variants. 84706 ENSG00000166123 GPT2 glutamic pyruvate transaminase (alanine aminotransferase) 2
The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, is expressed in a variety of tissues with the highest levels in pancreas and brain, and is localized in the nucleus. 1847 ENSG00000138166 DUSP5 dual specificity phosphatase 5
The protein encoded by this protein contains a RING finger, a motif known to be involved in protein-DNA and protein-protein interactions. The mouse counterpart of this protein has been shown to interact with Ube2l3/UbcM4, which is an ubiquitin-conjugating enzyme involved in embryonic development. 9781 ENSG00000151692 RNF144A ring finger protein 144A
Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. 8516 ENSG00000077943 ITGA8 integrin subunit alpha 8
The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. This gene encodes a lipoprotein receptor that is a member of the LDLR family and plays important roles in VLDL-triglyceride metabolism and the reelin signaling pathway. Mutations in this gene cause VLDLR-associated cerebellar hypoplasia. Alternative splicing generates multiple transcript variants encoding distinct isoforms for this gene. 7436 ENSG00000147852 VLDLR very low density lipoprotein receptor
Acetylcholine receptors at mature mammalian neuromuscular junctions are pentameric protein complexes composed of four subunits in the ratio of two alpha subunits to one beta, one epsilon, and one delta subunit. The acetylcholine receptor changes subunit composition shortly after birth when the epsilon subunit replaces the gamma subunit seen in embryonic receptors. Mutations in the epsilon subunit are associated with congenital myasthenic syndrome. 1145 ENSG00000108556 CHRNE cholinergic receptor nicotinic epsilon subunit
This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 ENSG00000169347 GP2 glycoprotein 2
NA 57596 ENSG00000183092 BEGAIN brain enriched guanylate kinase associated
NA ENSG00000256304 ENSG00000256304 CCDC150P1 coiled-coil domain containing 150 pseudogene 1
The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. 1440 ENSG00000108342 CSF3 colony stimulating factor 3
NA 123591 ENSG00000169758 TMEM266 transmembrane protein 266
NA ENSG00000255201 ENSG00000255201 RP11-350N15.4 NA
This gene encodes a member of the BCL-2 protein family. The proteins of this family form hetero- or homodimers and act as anti- and pro-apoptotic regulators that are involved in a wide variety of cellular activities such as embryonic development, homeostasis and tumorigenesis. The protein encoded by this gene is able to reduce the release of pro-apoptotic cytochrome c from mitochondria and block caspase activation. This gene is a direct transcription target of NF-kappa B in response to inflammatory mediators, and is up-regulated by different extracellular signals, such as granulocyte-macrophage colony-stimulating factor (GM-CSF), CD40, phorbol ester and inflammatory cytokine TNF and IL-1, which suggests a cytoprotective function that is essential for lymphocyte activation as well as cell survival. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 597 ENSG00000140379 BCL2A1 BCL2 related protein A1
NA ENSG00000253785 ENSG00000253785 CTC-308K20.3 NA
This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 ENSG00000204983 PRSS1 protease, serine 1
The protein encoded by this gene is induced by cyclic mechanical stretching in trabecular cells of the eye and it is also expressed in retina. This protein may play a role in trabecular meshwork function and the development of glaucoma. 10896 ENSG00000262180 OCLM oculomedin
The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. 6690 ENSG00000164266 SPINK1 serine peptidase inhibitor, Kazal type 1
The sphingolipid metabolite sphingosine-1-phosphate promotes cell proliferation and survival, whereas its precursor, sphingosine, has the opposite effect. The ceramidase ACER2 hydrolyzes very long chain ceramides to generate sphingosine (Xu et al., 2006 [PubMed 16940153]). 340485 ENSG00000177076 ACER2 alkaline ceramidase 2
NA ENSG00000261575 ENSG00000261575 RP11-259G18.1 NA
NA ENSG00000236364 ENSG00000236364 RP11-525G13.2 NA
NA ENSG00000259326 ENSG00000259326 RP11-102L12.2 NA
This gene encodes a transcriptional regulator that belongs to the EGR family of C2H2-type zinc-finger proteins. It is an immediate-early growth response gene which is induced by mitogenic stimulation. The protein encoded by this gene participates in the transcriptional regulation of genes in controling biological rhythm. It may also play a role in a wide variety of processes including muscle development, lymphocyte development, endothelial cell growth and migration, and neuronal development. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 1960 ENSG00000179388 EGR3 early growth response 3
NA ENSG00000258895 ENSG00000258895 CTD-2643K12.1 NA
The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. 5350 ENSG00000198523 PLN phospholamban
This gene encodes a member of the ankyrin repeat and SOCS box-containing (ASB) protein family. These proteins play a role in protein degradation by coupling suppressor of cytokine signalling (SOCS) proteins with the elongin BC complex. The encoded protein is a subunit of a multimeric E3 ubiquitin ligase complex that mediates the degradation of actin-binding proteins. This gene plays a role in retinoic acid-induced growth inhibition and differentiation of myeloid leukemia cells. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 51676 ENSG00000100628 ASB2 ankyrin repeat and SOCS box containing 2
NA 101928399 ENSG00000237989 TCONS_00029157 uncharacterized LOC101928399
The protein encoded by this gene is a DNA polymerase involved in base excision and repair, also called gap-filling DNA synthesis. The encoded protein, acting as a monomer, is normally found in the cytoplasm, but it translocates to the nucleus upon DNA damage. Several transcript variants of this gene exist, but the full-length nature of only one has been described to date. 5423 ENSG00000070501 POLB polymerase (DNA) beta
NA 26784 ENSG00000207405 SNORA64 small nucleolar RNA, H/ACA box 64
The protein encoded by this gene is a peripheral membrane protein that is a component of tight junctions or TJs. TJs form an apical junctional structure and act to control paracellular permeability and maintain cell polarity. This protein is related to angiomotin, an angiostatin binding protein that regulates endothelial cell migration and capillary formation. Two transcript variants encoding different isoforms have been found for this gene. 154810 ENSG00000166025 AMOTL1 angiomotin like 1
NA 201181 ENSG00000187595 ZNF385C zinc finger protein 385C
NA ENSG00000255513 ENSG00000255513 AC005363.9 NA
This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. 55636 ENSG00000171316 CHD7 chromodomain helicase DNA binding protein 7
NA ENSG00000229299 ENSG00000229299 RP4-583P15.10 NA
This gene is encodes a mitochondrial protein that contains a BH3 domain and acts as a pro-apoptotic factor. The encoded protein interacts with anti-apoptotic proteins, including the E1B 19 kDa protein and Bcl2. This gene is silenced in tumors by DNA methylation. 664 ENSG00000176171 BNIP3 BCL2/adenovirus E1B 19kDa interacting protein 3
This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. 5319 ENSG00000170890 PLA2G1B phospholipase A2 group IB
NA ENSG00000252464 ENSG00000252464 RN7SKP70 RNA, 7SK small nuclear pseudogene 70
NA 29923 ENSG00000135245 HILPDA hypoxia inducible lipid droplet associated
NA ENSG00000259407 ENSG00000259407 RP11-158M2.3 NA
The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein. 80832 ENSG00000100336 APOL4 apolipoprotein L4
This gene encodes a member of the alpha/beta hydrolase superfamily. It is imprinted, exhibiting preferential expression from the paternal allele in fetal tissues, and isoform-specific imprinting in lymphocytes. The loss of imprinting of this gene has been linked to certain types of cancer and may be due to promotor switching. The encoded protein may play a role in development. Alternatively spliced transcript variants encoding multiple isoforms have been identified for this gene. Pseudogenes of this gene are located on the short arm of chromosomes 3 and 4, and the long arm of chromosomes 6 and 15. 4232 ENSG00000106484 MEST mesoderm specific transcript
NA ENSG00000270890 ENSG00000270890 RP3-468K18.6 NA
NA ENSG00000256469 ENSG00000256469 RP11-856F16.2 NA
The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. 477 ENSG00000018625 ATP1A2 ATPase Na+/K+ transporting subunit alpha 2
The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. 2752 ENSG00000135821 GLUL glutamate-ammonia ligase
This gene encodes a member of the Polycomb-group (PcG) family. PcG family members form multimeric protein complexes, which are involved in maintaining the transcriptional repressive state of genes over successive cell generations. This protein associates with the embryonic ectoderm development protein, the VAV1 oncoprotein, and the X-linked nuclear protein. This protein may play a role in the hematopoietic and central nervous systems. Multiple alternatively splcied transcript variants encoding distinct isoforms have been identified for this gene. 2146 ENSG00000106462 EZH2 enhancer of zeste 2 polycomb repressive complex 2 subunit
This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. 7276 ENSG00000118271 TTR transthyretin
The protein encoded by this gene belongs to the ‘regulator of G protein signaling’ family. It inhibits signal transduction by increasing the GTPase activity of G protein alpha subunits. It also may play a role in regulating the kinetics of signaling in the phototransduction cascade. 6004 ENSG00000143333 RGS16 regulator of G-protein signaling 16
NA ENSG00000232956 ENSG00000232956 SNHG15 small nucleolar RNA host gene 15
The protein encoded by this gene belongs to the laminin family of secreted molecules. Laminins are heterotrimeric molecules that consist of alpha, beta, and gamma subunits that assemble through a coiled-coil domain. Laminins are essential for formation and function of the basement membrane and have additional functions in regulating cell migration and mechanical signal transduction. This gene encodes an alpha subunit and is responsive to several epithelial-mesenchymal regulators including keratinocyte growth factor, epidermal growth factor and insulin-like growth factor. Mutations in this gene have been identified as the cause of Herlitz type junctional epidermolysis bullosa and laryngoonychocutaneous syndrome. Alternative splicing and alternative promoter usage result in multiple transcript variants. 3909 ENSG00000053747 LAMA3 laminin subunit alpha 3
NA 729747 ENSG00000257446 ZNF878 zinc finger protein 878
NA ENSG00000226002 ENSG00000226002 GTF2IP14 general transcription factor IIi pseudogene 14
NA ENSG00000219470 ENSG00000219470 RP3-337H4.6 NA
NA ENSG00000258168 ENSG00000258168 RP11-588H23.3 NA
NA 404217 ENSG00000178531 CTXN1 cortexin 1
NA 259294 ENSG00000212124 TAS2R19 taste 2 receptor member 19
NA 144404 ENSG00000188735 TMEM120B transmembrane protein 120B
NA ENSG00000215032 ENSG00000215032 GNL3LP1 guanine nucleotide binding protein-like 3 (nucleolar)-like pseudogene 1
NA 101928371 ENSG00000225420 LOC101928371 uncharacterized LOC101928371
NA ENSG00000204677 ENSG00000204677 FAM153C family with sequence similarity 153 member C
The protein encoded by this gene, a member of the carnitine/choline acetyltransferase family, is the rate-controlling enzyme of the long-chain fatty acid beta-oxidation pathway in muscle mitochondria. This enzyme is required for the net transport of long-chain fatty acyl-CoAs from the cytoplasm into the mitochondria. Multiple transcript variants encoding different isoforms have been found for this gene, and read-through transcripts are expressed from the upstream locus that include exons from this gene. 1375 ENSG00000205560 CPT1B carnitine palmitoyltransferase 1B
TMEM97 is a conserved integral membrane protein that plays a role in controlling cellular cholesterol levels (Bartz et al., 2009 [PubMed 19583955]). 27346 ENSG00000109084 TMEM97 transmembrane protein 97
This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ENSG00000099204 ABLIM1 actin binding LIM protein 1
NA 284371 ENSG00000197608 ZNF841 zinc finger protein 841
The protein encoded by this gene is a member of the inorganic pyrophosphatase (PPase) family. PPases catalyze the hydrolysis of pyrophosphate to inorganic phosphate, which is important for the phosphate metabolism of cells. Studies of a similar protein in bovine suggested a cytoplasmic localization of this enzyme. 5464 ENSG00000180817 PPA1 pyrophosphatase (inorganic) 1
This gene encodes a member of the thioredoxin superfamily, a group of small, multifunctional redox-active proteins. Members of this family are characterized by a conserved active motif called the thioredoxin fold that catalyzes disulfide bond formation and isomerization. The encoded protein acts a redox-dependent regulator of the Wnt signaling pathway and is involved in cell growth and differentiation. 64359 ENSG00000167693 NXN nucleoredoxin
NA 4502 ENSG00000125148 MT2A metallothionein 2A
NA 85456 ENSG00000149115 TNKS1BP1 tankyrase 1 binding protein 1
The gene is part of a 3-member transmembrane receptor kinase receptor family with a processed pseudogene distal on chromosome 15. The encoded protein is activated by the products of the growth arrest-specific gene 6 and protein S genes and is involved in controlling cell survival and proliferation, spermatogenesis, immunoregulation and phagocytosis. The encoded protein has also been identified as a cell entry factor for Ebola and Marburg viruses. 7301 ENSG00000092445 TYRO3 TYRO3 protein tyrosine kinase
DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an enzyme that possesses both ATPase and DNA helicase activities. This gene is a homolog of the yeast CHL1 gene, and may function to maintain chromosome transmission fidelity and genome stability. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 1663 ENSG00000013573 DDX11 DEAD/H-box helicase 11
The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. 1504 ENSG00000168925 CTRB1 chymotrypsinogen B1
NA 284047 ENSG00000154874 CCDC144B coiled-coil domain containing 144B (pseudogene)
The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This family member plays a role in cellular detoxification as a pump for its substrate, organic anions. It may also function in prostaglandin-mediated cAMP signaling in ciliogenesis. Alternative splicing of this gene results in multiple transcript variants. 10257 ENSG00000125257 ABCC4 ATP binding cassette subfamily C member 4
This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 ENSG00000115386 REG1A regenerating family member 1 alpha
NA ENSG00000246250 ENSG00000246250 RP11-613D13.5 NA
NA ENSG00000231628 ENSG00000231628 RP3-355L5.4 NA
NA 140685 ENSG00000130584 ZBTB46 zinc finger and BTB domain containing 46
This gene encodes a protein with an arfaptin homology domain that is found both in the cytosol and as membrane-bound form on the Golgi complex and immature secretory granules. This protein is believed to be an autoantigen in insulin-dependent diabetes mellitus and primary Sjogren’s syndrome. Several transcript variants encoding two different isoforms have been found for this gene. 3382 ENSG00000003147 ICA1 islet cell autoantigen 1
NA ENSG00000236953 ENSG00000236953 ZDHHC20-IT1 ZDHHC20 intronic transcript 1
NA 9322 ENSG00000125733 TRIP10 thyroid hormone receptor interactor 10
NA ENSG00000111788 ENSG00000111788 RP11-22B23.1 NA
NA 5806 ENSG00000163661 PTX3 pentraxin 3
NA 23612 ENSG00000174307 PHLDA3 pleckstrin homology like domain family A member 3
NA ENSG00000219085 ENSG00000219085 NPM1P37 nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 37
NA ENSG00000270903 ENSG00000270903 HNRNPA3P9 heterogeneous nuclear ribonucleoprotein A3 pseudogene 9
The protein encoded by this gene contains five GTF2I-like repeats and each repeat possesses a potential helix-loop-helix (HLH) motif. It may have the ability to interact with other HLH-proteins and function as a transcription factor or as a positive transcriptional regulator under the control of Retinoblastoma protein. This gene plays a role in craniofacial and cognitive development and mutations have been associated with Williams-Beuren syndrome, a multisystem developmental disorder caused by deletion of multiple genes at 7q11.23. Alternative splicing results in multiple transcript variants. 9569 ENSG00000006704 GTF2IRD1 GTF2I repeat domain containing 1
NA 131583 ENSG00000185112 FAM43A family with sequence similarity 43 member A
NA ENSG00000250731 ENSG00000250731 TPM3P6 tropomyosin 3 pseudogene 6
The protein encoded by this gene belongs to the Rho family of the small GTPase superfamily. It contains a GTPase domain, a proline-rich region, a tandem of 2 BTB (broad complex, tramtrack, and bric-a-brac) domains, and a conserved C-terminal region. The protein plays a role in small GTPase-mediated signal transduction and the organization of the actin filament system. Alternate splicing results in multiple transcript variants. 9886 ENSG00000072422 RHOBTB1 Rho related BTB domain containing 1
NA ENSG00000270075 ENSG00000270075 RP11-127L20.5 NA
This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. 3768 ENSG00000184185 KCNJ12 potassium voltage-gated channel subfamily J member 12
The protein encoded by this gene, together with spectrin and actin, constitute the red cell membrane cytoskeletal network. This complex plays a critical role in erythrocyte shape and deformability. Mutations in this gene are associated with type 1 elliptocytosis (EL1). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 2035 ENSG00000159023 EPB41 erythrocyte membrane protein band 4.1
The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the White subfamily. It is involved in macrophage cholesterol and phospholipids transport, and may regulate cellular lipid homeostasis in other cell types. Six alternative splice variants have been identified. 9619 ENSG00000160179 ABCG1 ATP binding cassette subfamily G member 1
Mitochondrial uncoupling proteins (UCP) are members of the larger family of mitochondrial anion carrier proteins (MACP). UCPs separate oxidative phosphorylation from ATP synthesis with energy dissipated as heat, also referred to as the mitochondrial proton leak. UCPs facilitate the transfer of anions from the inner to the outer mitochondrial membrane and the return transfer of protons from the outer to the inner mitochondrial membrane. They also reduce the mitochondrial membrane potential in mammalian cells. The different UCPs have tissue-specific expression; this gene is primarily expressed in skeletal muscle. This gene’s protein product is postulated to protect mitochondria against lipid-induced oxidative stress. Expression levels of this gene increase when fatty acid supplies to mitochondria exceed their oxidation capacity and the protein enables the export of fatty acids from mitochondria. UCPs contain the three solcar protein domains typically found in MACPs. Two splice variants have been found for this gene. 7352 ENSG00000175564 UCP3 uncoupling protein 3
This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 ENSG00000175535 PNLIP pancreatic lipase
NA ENSG00000243829 ENSG00000243829 CTB-33G10.1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000136244 IL6 3569 This gene encodes a cytokine that functions in inflammation and the maturation of B cells. In addition, the encoded protein has been shown to be an endogenous pyrogen capable of inducing fever in people with autoimmune diseases or infections. The protein is primarily produced at sites of acute and chronic inflammation, where it is secreted into the serum and induces a transcriptional inflammatory response through interleukin 6 receptor, alpha. The functioning of this gene is implicated in a wide variety of inflammation-associated disease states, including suspectibility to diabetes mellitus and systemic juvenile rheumatoid arthritis. Alternative splicing results in multiple transcript variants. interleukin 6 NA
ENSG00000108342 CSF3 1440 The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. colony stimulating factor 3 NA
ENSG00000169429 CXCL8 3576 The protein encoded by this gene is a member of the CXC chemokine family. This chemokine is one of the major mediators of the inflammatory response. This chemokine is secreted by several cell types. It functions as a chemoattractant, and is also a potent angiogenic factor. This gene is believed to play a role in the pathogenesis of bronchiolitis, a common respiratory tract disease caused by viral infection. This gene and other ten members of the CXC chemokine gene family form a chemokine gene cluster in a region mapped to chromosome 4q. C-X-C motif chemokine ligand 8 NA
ENSG00000203785 SPRR2E 6704 This gene encodes a member of a family of small proline-rich proteins clustered in the epidermal differentiation complex on chromosome 1q21. The encoded protein, along with other family members, is a component of the cornified cell envelope that forms beneath the plasma membrane in terminally differentiated stratified squamous epithelia. This envelope serves as a barrier against extracellular and environmental factors. The seven SPRR2 genes (A-G) appear to have been homogenized by gene conversion compared to others in the cluster that exhibit greater differences in protein structure. small proline rich protein 2E NA
ENSG00000135447 PPP1R1A 5502 NA protein phosphatase 1 regulatory inhibitor subunit 1A NA
ENSG00000126233 SLURP1 57152 The protein encoded by this gene is a member of the Ly6/uPAR family but lacks a GPI-anchoring signal sequence. It is thought that this secreted protein contains antitumor activity. Mutations in this gene have been associated with Mal de Meleda, a rare autosomal recessive skin disorder. This gene maps to the same chromosomal region as several members of the Ly6/uPAR family of glycoprotein receptors. secreted LY6/PLAUR domain containing 1 NA
ENSG00000143153 ATP1B1 481 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 1 subunit. Alternatively spliced transcript variants encoding different isoforms have been described, but their biological validity is not known. ATPase Na+/K+ transporting subunit beta 1 NA
ENSG00000106688 SLC1A1 6505 This gene encodes a member of the high-affinity glutamate transporters that play an essential role in transporting glutamate across plasma membranes. In brain, these transporters are crucial in terminating the postsynaptic action of the neurotransmitter glutamate, and in maintaining extracellular glutamate concentrations below neurotoxic levels. This transporter also transports aspartate, and mutations in this gene are thought to cause dicarboxylicamino aciduria, also known as glutamate-aspartate transport defect. solute carrier family 1 member 1 NA
ENSG00000179294 NA NA NA NA TRUE
ENSG00000167191 GPRC5B 51704 This gene encodes a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The encoded protein may modulate insulin secretion and increased protein expression is associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. G protein-coupled receptor class C group 5 member B NA
ENSG00000198400 NTRK1 4914 This gene encodes a member of the neurotrophic tyrosine kinase receptor (NTKR) family. This kinase is a membrane-bound receptor that, upon neurotrophin binding, phosphorylates itself and members of the MAPK pathway. The presence of this kinase leads to cell differentiation and may play a role in specifying sensory neuron subtypes. Mutations in this gene have been associated with congenital insensitivity to pain, anhidrosis, self-mutilating behavior, mental retardation and cancer. Alternate transcriptional splice variants of this gene have been found, but only three have been characterized to date. neurotrophic receptor tyrosine kinase 1 NA
ENSG00000184557 SOCS3 9021 This gene encodes a member of the STAT-induced STAT inhibitor (SSI), also known as suppressor of cytokine signaling (SOCS), family. SSI family members are cytokine-inducible negative regulators of cytokine signaling. The expression of this gene is induced by various cytokines, including IL6, IL10, and interferon (IFN)-gamma. The protein encoded by this gene can bind to JAK2 kinase, and inhibit the activity of JAK2 kinase. Studies of the mouse counterpart of this gene suggested the roles of this gene in the negative regulation of fetal liver hematopoiesis, and placental development. suppressor of cytokine signaling 3 NA
ENSG00000267396 RP11-845C23.3 ENSG00000267396 NA NA NA
ENSG00000134548 SPX 80763 The protein encoded by this gene is a hormone involved in modulation of cardiovascular and renal function. It has also been shown in rats to cause weight loss. Several transcript variants have been found for this gene. spexin hormone NA
ENSG00000061656 SPAG4 6676 The mammalian sperm flagellum contains two cytoskeletal structures associated with the axoneme: the outer dense fibers surrounding the axoneme in the midpiece and principal piece and the fibrous sheath surrounding the outer dense fibers in the principal piece of the tail. Defects in these structures are associated with abnormal tail morphology, reduced sperm motility, and infertility. In the rat, the protein encoded by this gene associates with an outer dense fiber protein via a leucine zipper motif and localizes to the microtubules of the manchette and axoneme during sperm tail development. Alternative splicing results in multiple transcript variants encoding different isoforms. sperm associated antigen 4 NA
ENSG00000186847 KRT14 3861 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 NA
ENSG00000263873 RP11-334E6.12 ENSG00000263873 NA NA NA
ENSG00000114115 RBP1 5947 This gene encodes the carrier protein involved in the transport of retinol (vitamin A alcohol) from the liver storage site to peripheral tissue. Vitamin A is a fat-soluble vitamin necessary for growth, reproduction, differentiation of epithelial tissues, and vision. Multiple transcript variants encoding different isoforms have been found for this gene. retinol binding protein 1 NA
ENSG00000154548 SRSF12 135295 NA serine and arginine rich splicing factor 12 NA
ENSG00000169474 SPRR1A 6698 NA small proline rich protein 1A NA
ENSG00000154096 THY1 7070 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. Thy-1 cell surface antigen NA
ENSG00000128342 LIF 3976 The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. leukemia inhibitory factor NA
ENSG00000132329 RAMP1 10267 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. receptor activity modifying protein 1 NA
ENSG00000163827 LRRC2 79442 This gene encodes a member of the leucine-rich repeat-containing family of proteins, which function in diverse biological pathways. This family member may possibly be a nuclear protein. Similarity to the RAS suppressor protein, as well as expression down-regulation observed in tumor cells, suggests that it may function as a tumor suppressor. The gene is located in the chromosome 3 common eliminated region 1 (C3CER1), a 1.4 Mb region that is commonly deleted in diverse tumors. A related pseudogene has been identified on chromosome 2. leucine rich repeat containing 2 NA
ENSG00000178934 LGALS7B 653499 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. galectin 7B NA
ENSG00000095203 EPB41L4B 54566 NA erythrocyte membrane protein band 4.1 like 4B NA
ENSG00000166823 MESP1 55897 NA mesoderm posterior bHLH transcription factor 1 NA
ENSG00000164929 BAALC 79870 This gene was identified by gene expression studies in patients with acute myeloid leukemia (AML). The gene is conserved among mammals and is not found in lower organisms. Tissues that express this gene develop from the neuroectoderm. Multiple alternatively spliced transcript variants that encode different proteins have been described for this gene; however, some of the transcript variants are found only in AML cell lines. brain and acute leukemia, cytoplasmic NA
ENSG00000169469 SPRR1B 6699 The protein encoded by this gene is an envelope protein of keratinocytes. The encoded protein is crosslinked to membrane proteins by transglutaminase, forming an insoluble layer under the plasma membrane. This protein is proline-rich and contains several tandem amino acid repeats. small proline rich protein 1B NA
ENSG00000258603 RP3-414A15.10 ENSG00000258603 NA NA NA
ENSG00000164976 KIAA1161 57462 NA KIAA1161 NA
ENSG00000008517 IL32 9235 This gene encodes a member of the cytokine family. The protein contains a tyrosine sulfation site, 3 potential N-myristoylation sites, multiple putative phosphorylation sites, and an RGD cell-attachment sequence. Expression of this protein is increased after the activation of T-cells by mitogens or the activation of NK cells by IL-2. This protein induces the production of TNFalpha from macrophage cells. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. interleukin 32 NA
ENSG00000167034 NKX3-1 4824 This gene encodes a homeobox-containing transcription factor. This transcription factor functions as a negative regulator of epithelial cell growth in prostate tissue. Aberrant expression of this gene is associated with prostate tumor progression. Alternate splicing results in multiple transcript variants of this gene. NK3 homeobox 1 NA
ENSG00000167772 ANGPTL4 51129 This gene encodes a glycosylated, secreted protein containing a C-terminal fibrinogen domain. The encoded protein is induced by peroxisome proliferation activators and functions as a serum hormone that regulates glucose homeostasis, lipid metabolism, and insulin sensitivity. This protein can also act as an apoptosis survival factor for vascular endothelial cells and can prevent metastasis by inhibiting vascular growth and tumor cell invasion. The C-terminal domain may be proteolytically-cleaved from the full-length secreted protein. Decreased expression of this gene has been associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. This gene was previously referred to as ANGPTL2 but has been renamed ANGPTL4. angiopoietin like 4 NA
ENSG00000143556 S100A7 6278 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein differs from the other S100 proteins of known structure in its lack of calcium binding ability in one EF-hand at the N-terminus. The protein is overexpressed in hyperproliferative skin diseases, exhibits antimicrobial activities against bacteria and induces immunomodulatory activities. S100 calcium binding protein A7 NA
ENSG00000163734 CXCL3 2921 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. C-X-C motif chemokine ligand 3 NA
ENSG00000167964 RAB26 25837 Members of the RAB protein family, including RAB26, are important regulators of vesicular fusion and trafficking. The RAB family of small G proteins regulates intercellular vesicle trafficking, including exocytosis, endocytosis, and recycling (summary by Seki et al., 2000 [PubMed 11043516]). RAB26, member RAS oncogene family NA
ENSG00000153904 DDAH1 23576 This gene belongs to the dimethylarginine dimethylaminohydrolase (DDAH) gene family. The encoded enzyme plays a role in nitric oxide generation by regulating cellular concentrations of methylarginines, which in turn inhibit nitric oxide synthase activity. dimethylarginine dimethylaminohydrolase 1 NA
ENSG00000081041 CXCL2 2920 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. C-X-C motif chemokine ligand 2 NA
ENSG00000174564 IL20RB 53833 IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]). interleukin 20 receptor subunit beta NA
ENSG00000120738 EGR1 1958 The protein encoded by this gene belongs to the EGR family of C2H2-type zinc-finger proteins. It is a nuclear protein and functions as a transcriptional regulator. The products of target genes it activates are required for differentitation and mitogenesis. Studies suggest this is a cancer suppressor gene. early growth response 1 NA
ENSG00000257718 RP11-396F22.1 ENSG00000257718 NA NA NA
ENSG00000169509 CRCT1 54544 NA cysteine rich C-terminal 1 NA
ENSG00000080493 SLC4A4 8671 This gene encodes a sodium bicarbonate cotransporter (NBC) involved in the regulation of bicarbonate secretion and absorption and intracellular pH. Mutations in this gene are associated with proximal renal tubular acidosis. Multiple transcript variants encoding different isoforms have been found for this gene. solute carrier family 4 member 4 NA
ENSG00000178372 CALML5 51806 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. calmodulin like 5 NA
ENSG00000182902 SLC25A18 83733 NA solute carrier family 25 member 18 NA
ENSG00000175535 PNLIP 5406 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. pancreatic lipase NA
ENSG00000251322 SHANK3 ENSG00000251322 NA SH3 and multiple ankyrin repeat domains 3 NA
ENSG00000104435 STMN2 11075 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. stathmin 2 NA
ENSG00000053438 NNAT 4826 The protein encoded by this gene is a proteolipid that may be involved in the regulation of ion channels during brain development. The encoded protein may also play a role in forming and maintaining the structure of the nervous system. This gene is found within an intron of another gene, bladder cancer associated protein, but on the opposite strand. This gene is imprinted and is expressed only from the paternal allele. neuronatin NA
ENSG00000174669 SLC29A2 3177 The uptake of nucleosides by transporters, such as SLC29A2, is essential for nucleotide synthesis by salvage pathways in cells that lack de novo biosynthetic pathways. Nucleoside transport also plays a key role in the regulation of many physiologic processes through its effect on adenosine concentration at the cell surface (Griffiths et al., 1997 [PubMed 9396714]). solute carrier family 29 member 2 NA
ENSG00000184185 KCNJ12 3768 This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. potassium voltage-gated channel subfamily J member 12 NA
ENSG00000101187 SLCO4A1 28231 NA solute carrier organic anion transporter family member 4A1 NA
ENSG00000152463 OLAH 55301 NA oleoyl-ACP hydrolase NA
ENSG00000154277 UCHL1 7345 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. ubiquitin C-terminal hydrolase L1 NA
ENSG00000168811 IL12A 3592 This gene encodes a subunit of a cytokine that acts on T and natural killer cells, and has a broad array of biological activities. The cytokine is a disulfide-linked heterodimer composed of the 35-kD subunit encoded by this gene, and a 40-kD subunit that is a member of the cytokine receptor family. This cytokine is required for the T-cell-independent induction of interferon (IFN)-gamma, and is important for the differentiation of both Th1 and Th2 cells. The responses of lymphocytes to this cytokine are mediated by the activator of transcription protein STAT4. Nitric oxide synthase 2A (NOS2A/NOS2) is found to be required for the signaling process of this cytokine in innate immunity. interleukin 12A NA
ENSG00000164638 SLC29A4 222962 This gene encodes a member of the SLC29A/ENT transporter protein family. The encoded membrane protein catalyzes the reuptake of monoamines into presynaptic neurons, thus determining the intensity and duration of monoamine neural signaling. It has been shown to transport several compounds, including serotonin, dopamine, and the neurotoxin 1-methyl-4-phenylpyridinium. Alternative splicing results in multiple transcript variants. solute carrier family 29 member 4 NA
ENSG00000198576 ARC 23237 NA activity-regulated cytoskeleton-associated protein NA
ENSG00000070087 PFN2 5217 The protein encoded by this gene is a ubiquitous actin monomer-binding protein belonging to the profilin family. It is thought to regulate actin polymerization in response to extracellular signals. There are two alternatively spliced transcript variants encoding different isoforms described for this gene. profilin 2 NA
ENSG00000130208 APOC1 341 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. apolipoprotein C1 NA
ENSG00000167768 KRT1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 NA
ENSG00000154217 PITPNC1 26207 This gene encodes a member of the phosphatidylinositol transfer protein family. The encoded cytoplasmic protein plays a role in multiple processes including cell signaling and lipid metabolism by facilitating the transfer of phosphatidylinositol between membrane compartments. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the long arm of chromosome 1. phosphatidylinositol transfer protein, cytoplasmic 1 NA
ENSG00000123685 BATF3 55509 This gene encodes a member of the basic leucine zipper protein family. The encoded protein functions as a transcriptional repressor when heterodimerizing with JUN. The protein may play a role in repression of interleukin-2 and matrix metalloproteinase-1 transcription. basic leucine zipper ATF-like transcription factor 3 NA
ENSG00000172016 REG3A 5068 This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. regenerating family member 3 alpha NA
ENSG00000137825 ITPKA 3706 Regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of the inositol 1,4,5-trisphosphate 3-kinase is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. It is also a substrate for the cyclic AMP-dependent protein kinase, calcium/calmodulin- dependent protein kinase II, and protein kinase C in vitro. inositol-trisphosphate 3-kinase A NA
ENSG00000167656 LY6D 8581 NA lymphocyte antigen 6 complex, locus D NA
ENSG00000163739 CXCL1 2919 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. C-X-C motif chemokine ligand 1 NA
ENSG00000124191 TOX2 84969 NA TOX high mobility group box family member 2 NA
ENSG00000155090 KLF10 7071 This gene encodes a member of a family of proteins that feature C2H2-type zinc finger domains. The encoded protein is a transcriptional repressor that acts as an effector of transforming growth factor beta signaling. Activity of this protein may inhibit the growth of cancers, particularly pancreatic cancer. Alternative splicing results in multiple transcript variants. Kruppel like factor 10 NA
ENSG00000232803 SLCO4A1-AS1 100127888 NA SLCO4A1 antisense RNA 1 NA
ENSG00000161281 COX7A1 1346 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. cytochrome c oxidase subunit 7A1 NA
ENSG00000168309 FAM107A 11170 NA family with sequence similarity 107 member A NA
ENSG00000267473 NA NA NA NA TRUE
ENSG00000174938 SEZ6L2 26470 This gene encodes a seizure-related protein that is localized on the cell surface. The gene is located in a region of chromosome 16p11.2 that is thought to contain candidate genes for autism spectrum disorders (ASD), though there is no evidence directly implicating this gene in ASD. Increased expression of this gene has been found in lung cancers, and the protein is therefore considered to be a novel prognostic marker for lung cancer. Alternative splicing of this gene results in multiple transcript variants. seizure related 6 homolog like 2 NA
ENSG00000152154 TMEM178A 130733 NA transmembrane protein 178A NA
ENSG00000167588 GPD1 2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. glycerol-3-phosphate dehydrogenase 1 NA
ENSG00000204323 SMIM5 643008 NA small integral membrane protein 5 NA
ENSG00000164879 CA3 761 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. carbonic anhydrase 3 NA
ENSG00000104369 JPH1 56704 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. junctophilin 1 NA
ENSG00000177542 SLC25A22 79751 This gene encodes a mitochondrial glutamate carrier. Mutations in this gene are associated with early infantile epileptic encephalopathy. Multiple alternatively spliced variants, encoding the same protein, have been identified. solute carrier family 25 member 22 NA
ENSG00000182230 LOC100507387 100507387 NA uncharacterized LOC100507387 NA
ENSG00000182230 FAM153B 202134 NA family with sequence similarity 153 member B NA
ENSG00000260025 RP11-490M8.1 ENSG00000260025 NA NA NA
ENSG00000105376 ICAM5 7087 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is expressed on the surface of telencephalic neurons and displays two types of adhesion activity, homophilic binding between neurons and heterophilic binding between neurons and leukocytes. It may be a critical component in neuron-microglial cell interactions in the course of normal development or as part of neurodegenerative diseases. intercellular adhesion molecule 5 NA
ENSG00000151014 NOCT 25819 The protein encoded by this gene is highly similar to Nocturnin, a gene identified as a circadian clock regulated gene in Xenopus laevis. This protein and Nocturnin protein share similarity with the C-terminal domain of a yeast transcription factor, carbon catabolite repression 4 (CCR4). The mRNA abundance of a similar gene in mouse has been shown to exhibit circadian rhythmicity, which suggests a role for this protein in clock function or as a circadian clock effector. nocturnin NA
ENSG00000111012 CYP27B1 1594 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The protein encoded by this gene localizes to the inner mitochondrial membrane where it hydroxylates 25-hydroxyvitamin D3 at the 1alpha position. This reaction synthesizes 1alpha,25-dihydroxyvitamin D3, the active form of vitamin D3, which binds to the vitamin D receptor and regulates calcium metabolism. Thus this enzyme regulates the level of biologically active vitamin D and plays an important role in calcium homeostasis. Mutations in this gene can result in vitamin D-dependent rickets type I. cytochrome P450 family 27 subfamily B member 1 NA
ENSG00000253549 CA3-AS1 100996348 NA CA3 antisense RNA 1 NA
ENSG00000163141 BNIPL 149428 The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BCL2/adenovirus E1B 19kD interacting protein like NA
ENSG00000185774 KCNIP4 80333 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belong to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. This protein member also interacts with presenilin. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene. potassium voltage-gated channel interacting protein 4 NA
ENSG00000145362 ANK2 287 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. ankyrin 2, neuronal NA
ENSG00000120694 HSPH1 10808 NA heat shock protein family H (Hsp110) member 1 NA
ENSG00000010438 PRSS3 5646 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. protease, serine 3 NA
ENSG00000112096 LOC100129518 100129518 NA uncharacterized LOC100129518 NA
ENSG00000112096 SOD2 6648 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial NA
ENSG00000144369 FAM171B 165215 NA family with sequence similarity 171 member B NA
ENSG00000187193 MT1X 4501 NA metallothionein 1X NA
ENSG00000079435 LIPE 3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. lipase E, hormone sensitive type NA
ENSG00000076555 ACACB 32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta NA
ENSG00000124253 PCK1 5105 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. phosphoenolpyruvate carboxykinase 1 NA
ENSG00000106976 DNM1 1759 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. dynamin 1 NA
ENSG00000059915 PSD 5662 This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. pleckstrin and Sec7 domain containing NA
ENSG00000184524 CEND1 51286 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. cell cycle exit and neuronal differentiation 1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
NKD2 85409 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000145506 naked cuticle homolog 2 NA
AIFM3 150209 NA ENSG00000183773 apoptosis inducing factor, mitochondria associated 3 NA
PSD 5662 This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. ENSG00000059915 pleckstrin and Sec7 domain containing NA
THY1 7070 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. ENSG00000154096 Thy-1 cell surface antigen NA
BCHE 590 Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. ENSG00000114200 butyrylcholinesterase NA
RP11-334E6.12 ENSG00000263873 NA ENSG00000263873 NA NA
TACR2 6865 This gene belongs to a family of genes that function as receptors for tachykinins. Receptor affinities are specified by variations in the 5’-end of the sequence. The receptors belonging to this family are characterized by interactions with G proteins and 7 hydrophobic transmembrane regions. This gene encodes the receptor for the tachykinin neuropeptide substance K, also referred to as neurokinin A. ENSG00000075073 tachykinin receptor 2 NA
FUT2 2524 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. ENSG00000176920 fucosyltransferase 2 NA
S100A14 57402 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). ENSG00000189334 S100 calcium binding protein A14 NA
PADI2 11240 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. ENSG00000117115 peptidyl arginine deiminase 2 NA
NA NA NA ENSG00000257499 NA TRUE
SPINT1 6692 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. ENSG00000166145 serine peptidase inhibitor, Kunitz type 1 NA
CASZ1 54897 The protein encoded by this gene is a zinc finger transcription factor. The encoded protein may function as a tumor suppressor, and single nucleotide polymorphisms in this gene are associated with blood pressure variation. Alternative splicing results in multiple transcript variants that encode different protein isoforms. ENSG00000130940 castor zinc finger 1 NA
AGAP2 116986 The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000135439 ArfGAP with GTPase domain, ankyrin repeat and PH domain 2 NA
SYNM 23336 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. ENSG00000182253 synemin NA
ANO1-AS1 ENSG00000254902 NA ENSG00000254902 ANO1 antisense RNA 1 NA
FAM46B 115572 NA ENSG00000158246 family with sequence similarity 46 member B NA
GLB1L2 89944 NA ENSG00000149328 galactosidase beta 1 like 2 NA
CYP4F29P 54055 NA ENSG00000228314 cytochrome P450 family 4 subfamily F member 29, pseudogene NA
PTK6 5753 The protein encoded by this gene is a cytoplasmic nonreceptor protein kinase which may function as an intracellular signal transducer in epithelial tissues. Overexpression of this gene in mammary epithelial cells leads to sensitization of the cells to epidermal growth factor and results in a partially transformed phenotype. Expression of this gene has been detected at low levels in some breast tumors but not in normal breast tissue. The encoded protein has been shown to undergo autophosphorylation. Alternative splicing results in multiple transcript variants. ENSG00000101213 protein tyrosine kinase 6 NA
CIART 148523 NA ENSG00000159208 circadian associated repressor of transcription NA
CNN1 1264 NA ENSG00000130176 calponin 1 NA
RORA 6095 The protein encoded by this gene is a member of the NR1 subfamily of nuclear hormone receptors. It can bind as a monomer or as a homodimer to hormone response elements upstream of several genes to enhance the expression of those genes. The encoded protein has been shown to interact with NM23-2, a nucleoside diphosphate kinase involved in organogenesis and differentiation, as well as with NM23-1, the product of a tumor metastasis suppressor candidate gene. Also, it has been shown to aid in the transcriptional regulation of some genes involved in circadian rhythm. Four transcript variants encoding different isoforms have been described for this gene. ENSG00000069667 RAR related orphan receptor A NA
TMEM52 339456 NA ENSG00000178821 transmembrane protein 52 NA
EGR3 1960 This gene encodes a transcriptional regulator that belongs to the EGR family of C2H2-type zinc-finger proteins. It is an immediate-early growth response gene which is induced by mitogenic stimulation. The protein encoded by this gene participates in the transcriptional regulation of genes in controling biological rhythm. It may also play a role in a wide variety of processes including muscle development, lymphocyte development, endothelial cell growth and migration, and neuronal development. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000179388 early growth response 3 NA
SORBS1 10580 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. ENSG00000095637 sorbin and SH3 domain containing 1 NA
PROM2 150696 This gene encodes a member of the prominin family of pentaspan membrane glycoproteins. The encoded protein localizes to basal epithelial cells and may be involved in the organization of plasma membrane microdomains. Alternative splicing results in multiple transcript variants. ENSG00000155066 prominin 2 NA
AMOTL1 154810 The protein encoded by this gene is a peripheral membrane protein that is a component of tight junctions or TJs. TJs form an apical junctional structure and act to control paracellular permeability and maintain cell polarity. This protein is related to angiomotin, an angiostatin binding protein that regulates endothelial cell migration and capillary formation. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000166025 angiomotin like 1 NA
DUOX1 53905 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. ENSG00000137857 dual oxidase 1 NA
RP11-650L12.2 ENSG00000261762 NA ENSG00000261762 NA NA
AKAP1 8165 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to type I and type II regulatory subunits of PKA and anchors them to the mitochondrion. This protein is speculated to be involved in the cAMP-dependent signal transduction pathway and in directing RNA to a specific cellular compartment. ENSG00000121057 A-kinase anchoring protein 1 NA
TPSAB1 7177 Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. ENSG00000172236 tryptase alpha/beta 1 NA
RP11-290F24.6 ENSG00000267940 NA ENSG00000267940 NA NA
CPLX1 10815 Proteins encoded by the complexin/synaphin gene family are cytosolic proteins that function in synaptic vesicle exocytosis. These proteins bind syntaxin, part of the SNAP receptor. The protein product of this gene binds to the SNAP receptor complex and disrupts it, allowing transmitter release. ENSG00000168993 complexin 1 NA
WFDC3 140686 This gene encodes a member of the WAP-type four-disulfide core (WFDC) domain family. The WFDC domain, or WAP signature motif, contains eight cysteines forming four disulfide bonds at the core of the protein, and functions as a protease inhibitor. The encoded protein contains four WFDC domains. Most WFDC genes are localized to chromosome 20q12-q13 in two clusters: centromeric and telomeric. This gene belongs to the telomeric cluster. Alternatively spliced transcript variants have been observed but their full-length nature has not been determined. ENSG00000124116 WAP four-disulfide core domain 3 NA
RP11-6O2.4 ENSG00000261054 NA ENSG00000261054 NA NA
CCDC85C 317762 NA ENSG00000205476 coiled-coil domain containing 85C NA
RP11-856F16.2 ENSG00000256469 NA ENSG00000256469 NA NA
RP11-64B16.2 ENSG00000213144 NA ENSG00000213144 NA NA
TGM1 7051 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). ENSG00000092295 transglutaminase 1 NA
BAG3 9531 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The protein encoded by this gene contains a WW domain in the N-terminal region and a BAG domain in the C-terminal region. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner. ENSG00000151929 BCL2 associated athanogene 3 NA
PDPN 10630 This gene encodes a type-I integral membrane glycoprotein with diverse distribution in human tissues. The physiological function of this protein may be related to its mucin-type character. The homologous protein in other species has been described as a differentiation antigen and influenza-virus receptor. The specific function of this protein has not been determined but it has been proposed as a marker of lung injury. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000162493 podoplanin NA
MYLK 4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. ENSG00000065534 myosin light chain kinase NA
RP11-6O2.3 ENSG00000261616 NA ENSG00000261616 NA NA
HSPB8 26353 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. ENSG00000152137 heat shock protein family B (small) member 8 NA
MICALL1 85377 NA ENSG00000100139 MICAL like 1 NA
CCDC181 57821 NA ENSG00000117477 coiled-coil domain containing 181 NA
HSPA2 3306 NA ENSG00000126803 heat shock protein family A (Hsp70) member 2 NA
RARG 5916 This gene encodes a retinoic acid receptor that belongs to the nuclear hormone receptor family. Retinoic acid receptors (RARs) act as ligand-dependent transcriptional regulators. When bound to ligands, RARs activate transcription by binding as heterodimers to the retinoic acid response elements (RARE) found in the promoter regions of the target genes. In their unbound form, RARs repress transcription of their target genes. RARs are involved in various biological processes, including limb bud development, skeletal growth, and matrix homeostasis. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000172819 retinoic acid receptor gamma NA
TMEM132A 54972 This gene encodes a protein that is highly similar to the rat Grp78-binding protein (GBP). Alternatively spliced transcript variants encoding different isoforms have been described. ENSG00000006118 transmembrane protein 132A NA
MUCL1 118430 NA ENSG00000172551 mucin like 1 NA
SLC25A25 114789 The protein encoded by this gene belongs to the family of calcium-binding mitochondrial carriers, with a characteristic mitochondrial carrier domain at the C-terminus. These proteins are found in the inner membranes of mitochondria, and function as transport proteins. They shuttle metabolites, nucleotides and cofactors through the mitochondrial membrane and thereby connect and/or regulate cytoplasm and matrix functions. This protein may function as an ATP-Mg/Pi carrier that mediates the transport of Mg-ATP in exchange for phosphate, and likely responsible for the net uptake or efflux of adenine nucleotides into or from the mitochondria. Alternatively spliced transcript variants encoding different isoforms with a common C-terminus but variable N-termini have been described for this gene. ENSG00000148339 solute carrier family 25 member 25 NA
CKB 1152 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. ENSG00000166165 creatine kinase B NA
CPT1C 126129 This gene encodes a member of the carnitine/choline acetyltransferase family. The encoded protein regulates the beta-oxidation and transport of long-chain fatty acids into mitochondria, and may play a role in the regulation of feeding behavior and whole-body energy homeostasis. Alternatively spliced transcript variants encoding multiple protein isoforms have been observed for this gene. ENSG00000169169 carnitine palmitoyltransferase 1C NA
ANKRD9 122416 NA ENSG00000156381 ankyrin repeat domain 9 NA
RP11-46J23.1 ENSG00000272986 NA ENSG00000272986 NA NA
LOC101929777 101929777 NA ENSG00000108379 uncharacterized LOC101929777 NA
WNT3 7473 The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 98% amino acid identity to mouse Wnt3 protein, and 84% to human WNT3A protein, another WNT gene product. The mouse studies show the requirement of Wnt3 in primary axis formation in the mouse. Studies of the gene expression suggest that this gene may play a key role in some cases of human breast, rectal, lung, and gastric cancer through activation of the WNT-beta-catenin-TCF signaling pathway. This gene is clustered with WNT15, another family member, in the chromosome 17q21 region. ENSG00000108379 Wnt family member 3 NA
RP5-1126H10.2 ENSG00000272084 NA ENSG00000272084 NA NA
AK7 122481 NA ENSG00000140057 adenylate kinase 7 NA
TEAD3 7005 This gene product is a member of the transcriptional enhancer factor (TEF) family of transcription factors, which contain the TEA/ATTS DNA-binding domain. It is predominantly expressed in the placenta and is involved in the transactivation of the chorionic somatomammotropin-B gene enhancer. Translation of this protein is initiated at a non-AUG (AUA) start codon. ENSG00000007866 TEA domain transcription factor 3 NA
TNFAIP8L1 126282 NA ENSG00000185361 TNF alpha induced protein 8 like 1 NA
CYP3A5 1577 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. ENSG00000106258 cytochrome P450 family 3 subfamily A member 5 NA
NT5DC3 51559 NA ENSG00000111696 5’-nucleotidase domain containing 3 NA
NXPH3 11248 NA ENSG00000182575 neurexophilin 3 NA
PGAM5 192111 NA ENSG00000247077 PGAM family member 5, mitochondrial serine/threonine protein phosphatase NA
VSIG2 23584 NA ENSG00000019102 V-set and immunoglobulin domain containing 2 NA
RP11-1055B8.4 ENSG00000262877 NA ENSG00000262877 NA NA
TUFT1 7286 Tuftelin is an acidic protein that is thought to play a role in dental enamel mineralization and is implicated in caries susceptibility. It is also thought to be involved with adaptation to hypoxia, mesenchymal stem cell function, and neurotrophin nerve growth factor mediated neuronal differentiation. ENSG00000143367 tuftelin 1 NA
MFHAS1 9258 Identified in a human 8p amplicon, this gene is a potential oncogene whose expression is enhanced in some malignant fibrous histiocytomas (MFH). The primary structure of its product includes an ATP/GTP-binding site, three leucine zipper domains, and a leucine-rich tandem repeat, which are structural or functional elements for interactions among proteins related to the cell cycle, and which suggest that overexpression might be oncogenic with respect to MFH. ENSG00000147324 malignant fibrous histiocytoma amplified sequence 1 NA
TMEM79 84283 NA ENSG00000163472 transmembrane protein 79 NA
DNAJB5 25822 DNAJB5 belongs to the evolutionarily conserved DNAJ/HSP40 protein family. For background information on the DNAJ family, see MIM 608375. ENSG00000137094 DnaJ heat shock protein family (Hsp40) member B5 NA
RP11-196G11.2 ENSG00000260911 NA ENSG00000260911 NA NA
CSF3 1440 The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. ENSG00000108342 colony stimulating factor 3 NA
TSPAN5 10098 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. ENSG00000168785 tetraspanin 5 NA
AF001548.5 ENSG00000263335 NA ENSG00000263335 NA NA
IL34 146433 Interleukin-34 is a cytokine that promotes the differentiation and viability of monocytes and macrophages through the colony-stimulating factor-1 receptor (CSF1R; MIM 164770) (Lin et al., 2008 [PubMed 18467591]). ENSG00000157368 interleukin 34 NA
CHRNA5 1138 The protein encoded by this gene is a nicotinic acetylcholine receptor subunit and a member of a superfamily of ligand-gated ion channels that mediate fast signal transmission at synapses. These receptors are thought to be heteropentamers composed of separate but similar subunits. Defects in this gene have been linked to susceptibility to lung cancer type 2 (LNCR2). ENSG00000169684 cholinergic receptor nicotinic alpha 5 subunit NA
MFSD2A 84879 NA ENSG00000168389 major facilitator superfamily domain containing 2A NA
SERBP1P3 ENSG00000242142 NA ENSG00000242142 SERPINE1 mRNA binding protein 1 pseudogene 3 NA
NA NA NA ENSG00000182319 NA TRUE
HES6 55502 This gene encodes a member of a subfamily of basic helix-loop-helix transcription repressors that have homology to the Drosophila enhancer of split genes. Members of this gene family regulate cell differentiation in numerous cell types. The protein encoded by this gene functions as a cofactor, interacting with other transcription factors through a tetrapeptide domain in its C-terminus. Alternatively spliced transcript variants encoding different isoforms have been described. ENSG00000144485 hes family bHLH transcription factor 6 NA
RGMB 285704 RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). ENSG00000174136 repulsive guidance molecule family member b NA
RP5-1092A3.4 ENSG00000270605 NA ENSG00000270605 NA NA
SNCAIP 9627 This gene encodes a protein containing several protein-protein interaction domains, including ankyrin-like repeats, a coiled-coil domain, and an ATP/GTP-binding motif. The encoded protein interacts with alpha-synuclein in neuronal tissue and may play a role in the formation of cytoplasmic inclusions and neurodegeneration. A mutation in this gene has been associated with Parkinson’s disease. Alternative splicing results in multiple transcript variants. ENSG00000064692 synuclein alpha interacting protein NA
SLC45A3 85414 NA ENSG00000158715 solute carrier family 45 member 3 NA
PTGES3L 100885848 NA ENSG00000267060 prostaglandin E synthase 3 (cytosolic)-like NA
SLC7A5 8140 NA ENSG00000103257 solute carrier family 7 member 5 NA
CD200R1 131450 This gene encodes a receptor for the OX-2 membrane glycoprotein. Both the receptor and substrate are cell surface glycoproteins containing two immunoglobulin-like domains. This receptor is restricted to the surfaces of myeloid lineage cells and the receptor-substrate interaction may function as a myeloid downregulatory signal. Mouse studies of a related gene suggest that this interaction may control myeloid function in a tissue-specific manner. Alternative splicing of this gene results in multiple transcript variants. ENSG00000163606 CD200 receptor 1 NA
MST1L ENSG00000186715 NA ENSG00000186715 macrophage stimulating 1-like NA
RP4-536B24.2 ENSG00000260466 NA ENSG00000260466 NA NA
LSM11 134353 NA ENSG00000155858 LSM11, U7 small nuclear RNA associated NA
PLGLB1 5343 NA ENSG00000183281 plasminogen-like B1 NA
OMG 4974 NA ENSG00000126861 oligodendrocyte myelin glycoprotein NA
LOC102723927 102723927 NA ENSG00000261186 uncharacterized LOC102723927 NA
RPS20P21 ENSG00000244295 NA ENSG00000244295 ribosomal protein S20 pseudogene 21 NA
NCR3LG1 374383 B7H6 belongs to the B7 family (see MIM 605402) and is selectively expressed on tumor cells. Interaction of B7H6 with NKp30 (NCR3; MIM 611550) results in natural killer (NK) cell activation and cytotoxicity (Brandt et al., 2009 [PubMed 19528259]). ENSG00000188211 natural killer cell cytotoxicity receptor 3 ligand 1 NA
CAPN5 726 Calpains are calcium-dependent cysteine proteases involved in signal transduction in a variety of cellular processes. A functional calpain protein consists of an invariant small subunit and 1 of a family of large subunits. CAPN5 is one of the large subunits. Unlike some of the calpains, CAPN5 and CAPN6 lack a calmodulin-like domain IV. Because of the significant similarity to Caenorhabditis elegans sex determination gene tra-3, CAPN5 is also called as HTRA3. ENSG00000149260 calpain 5 NA
KRT19 3880 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. ENSG00000171345 keratin 19 NA
PER2 8864 This gene is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain. Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism, and behavior. This gene is upregulated by CLOCK/ARNTL heterodimers but then represses this upregulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/ARNTL. Polymorphisms in this gene may increase the risk of getting certain cancers and have been linked to sleep disorders. ENSG00000132326 period circadian clock 2 NA
RP1-193H18.2 ENSG00000267194 NA ENSG00000267194 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol notfound
ENSG00000108342 1440 colony stimulating factor 3 The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. CSF3 NA
ENSG00000115170 90 activin A receptor type 1 Activins are dimeric growth and differentiation factors which belong to the transforming growth factor-beta (TGF-beta) superfamily of structurally related signaling proteins. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I ( I and IB) and two type II (II and IIB) receptors. These receptors are all transmembrane proteins, composed of a ligand-binding extracellular domain with cysteine-rich region, a transmembrane domain, and a cytoplasmic domain with predicted serine/threonine specificity. Type I receptors are essential for signaling; and type II receptors are required for binding ligands and for expression of type I receptors. Type I and II receptors form a stable complex after ligand binding, resulting in phosphorylation of type I receptors by type II receptors. This gene encodes activin A type I receptor which signals a particular transcriptional response in concert with activin type II receptors. Mutations in this gene are associated with fibrodysplasia ossificans progressive. ACVR1 NA
ENSG00000013583 50865 heme binding protein 1 The full-length protein encoded by this gene is an intracellular tetrapyrrole-binding protein. This protein includes a natural chemoattractant peptide of 21 amino acids at the N-terminus, which is a natural ligand for formyl peptide receptor-like receptor 2 (FPRL2) and promotes calcium mobilization and chemotaxis in monocytes and dendritic cells. HEBP1 NA
ENSG00000185950 8660 insulin receptor substrate 2 This gene encodes the insulin receptor substrate 2, a cytoplasmic signaling molecule that mediates effects of insulin, insulin-like growth factor 1, and other cytokines by acting as a molecular adaptor between diverse receptor tyrosine kinases and downstream effectors. The product of this gene is phosphorylated by the insulin receptor tyrosine kinase upon receptor stimulation, as well as by an interleukin 4 receptor-associated kinase in response to IL4 treatment. IRS2 NA
ENSG00000123384 4035 LDL receptor related protein 1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. LRP1 NA
ENSG00000255813 NA NA NA NA TRUE
ENSG00000198363 444 aspartate beta-hydroxylase This gene is thought to play an important role in calcium homeostasis. The gene is expressed from two promoters and undergoes extensive alternative splicing. The encoded set of proteins share varying amounts of overlap near their N-termini but have substantial variations in their C-terminal domains resulting in distinct functional properties. The longest isoforms (a and f) include a C-terminal Aspartyl/Asparaginyl beta-hydroxylase domain that hydroxylates aspartic acid or asparagine residues in the epidermal growth factor (EGF)-like domains of some proteins, including protein C, coagulation factors VII, IX, and X, and the complement factors C1R and C1S. Other isoforms differ primarily in the C-terminal sequence and lack the hydroxylase domain, and some have been localized to the endoplasmic and sarcoplasmic reticulum. Some of these isoforms are found in complexes with calsequestrin, triadin, and the ryanodine receptor, and have been shown to regulate calcium release from the sarcoplasmic reticulum. Some isoforms have been implicated in metastasis. ASPH NA
ENSG00000196372 79754 ankyrin repeat and SOCS box containing 13 The protein encoded by this gene is a member of the ankyrin repeat and SOCS box-containing (ASB) family of proteins. They contain ankyrin repeat sequence and a SOCS box domain. The SOCS box serves to couple suppressor of cytokine signalling (SOCS) proteins and their binding partners with the elongin B and C complex, possibly targeting them for degradation. Multiple alternatively spliced transcript variants, both protein-coding and not protein-coding, have been described for this gene. ASB13 NA
ENSG00000132669 54453 Ras and Rab interactor 2 The RAB5 protein is a small GTPase involved in membrane trafficking in the early endocytic pathway. The protein encoded by this gene binds the GTP-bound form of the RAB5 protein preferentially over the GDP-bound form, and functions as a guanine nucleotide exchange factor for RAB5. The encoded protein is found primarily as a tetramer in the cytoplasm and does not bind other members of the RAB family. Mutations in this gene cause macrocephaly alopecia cutis laxa and scoliosis (MACS) syndrome, an elastic tissue disorder, as well as the related connective tissue disorder, RIN2 syndrome. Alternative splicing results in multiple transcript variants. RIN2 NA
ENSG00000233547 ENSG00000233547 NA NA RP11-57H14.2 NA
ENSG00000198478 83699 SH3 domain binding glutamate rich protein like 2 NA SH3BGRL2 NA
ENSG00000124588 4835 NAD(P)H quinone dehydrogenase 2 This gene encodes a member of the thioredoxin family of enzymes. It is a cytosolic and ubiquitously expressed flavoprotein that catalyzes the two-electron reduction of quinone substrates and uses dihydronicotinamide riboside as a reducing coenzyme. Mutations in this gene have been associated with neurodegenerative diseases and several cancers. Alternative splicing results in multiple transcript variants. NQO2 NA
ENSG00000157107 115548 FCH domain only 2 NA FCHO2 NA
ENSG00000170873 9788 metastasis suppressor 1 NA MTSS1 NA
ENSG00000113269 55819 ring finger protein 130 The protein encoded by this gene contains a RING finger motif and is similar to g1, a Drosophila zinc-finger protein that is expressed in mesoderm and involved in embryonic development. The expression of the mouse counterpart was found to be upregulated in myeloblastic cells following IL3 deprivation, suggesting that this gene may regulate growth factor withdrawal-induced apoptosis of myeloid precursor cells. Alternative splicing results in multiple transcript variants. RNF130 NA
ENSG00000131507 80762 Nedd4 family interacting protein 1 The protein encoded by this gene belongs to a small group of evolutionarily conserved proteins with three transmembrane domains. It is a potential target for ubiquitination by the Nedd4 family of proteins. This protein is thought to be part of a family of integral Golgi membrane proteins. NDFIP1 NA
ENSG00000123933 10608 MAX dimerization protein 4 This gene is a member of the MAD gene family . The MAD genes encode basic helix-loop-helix-leucine zipper proteins that heterodimerize with MAX protein, forming a transcriptional repression complex. The MAD proteins compete for MAX binding with MYC, which heterodimerizes with MAX forming a transcriptional activation complex. Studies in rodents suggest that the MAD genes are tumor suppressors and contribute to the regulation of cell growth in differentiating tissues. MXD4 NA
ENSG00000263640 ENSG00000263640 NA NA AF235103.1 NA
ENSG00000178226 146547 protease, serine 36 NA PRSS36 NA
ENSG00000267543 ENSG00000267543 NA NA RP11-666A8.7 NA
ENSG00000144115 55258 threonine synthase like 2 This gene encodes a threonine synthase-like protein. A similar enzyme in mouse can catalyze the degradation of O-phospho-homoserine to a-ketobutyrate, phosphate, and ammonia. This protein also has phospho-lyase activity on both gamma and beta phosphorylated substrates. In mouse an alternatively spliced form of this protein has been shown to act as a cytokine and can induce the production of the inflammatory cytokine IL6 in osteoblasts. Alternate splicing results in multiple transcript variants. THNSL2 NA
ENSG00000225190 9842 pleckstrin homology and RUN domain containing M1 The protein encoded by this gene is essential for bone resorption, and may play a critical role in vesicular transport in the osteoclast. Mutations in this gene are associated with autosomal recessive osteopetrosis type 6 (OPTB6). Alternatively spliced transcript variants have been found for this gene. PLEKHM1 NA
ENSG00000260306 ENSG00000260306 NA NA RP11-645C24.5 NA
ENSG00000264043 NA NA NA NA TRUE
ENSG00000185340 10634 growth arrest specific 2 like 1 This gene encodes a member of the growth arrest-specific 2 protein family. This protein binds components of the cytoskeleton and may be involved in mediating interactions between microtubules and microfilaments. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 9. GAS2L1 NA
ENSG00000171055 9637 fasciculation and elongation protein zeta 2 This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Other orthologs include the rat gene that encodes zygin II, which can bind to synaptotagmin. FEZ2 NA
ENSG00000230537 100507103 uncharacterized LOC100507103 NA LOC100507103 NA
ENSG00000164733 1508 cathepsin B This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. CTSB NA
ENSG00000005882 5164 pyruvate dehydrogenase kinase 2 This gene encodes a member of the pyruvate dehydrogenase kinase family. The encoded protein phosphorylates pyruvate dehydrogenase, down-regulating the activity of the mitochondrial pyruvate dehydrogenase complex. Overexpression of this gene may play a role in both cancer and diabetes. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PDK2 NA
ENSG00000156052 2776 G protein subunit alpha q This locus encodes a guanine nucleotide-binding protein. The encoded protein, an alpha subunit in the Gq class, couples a seven-transmembrane domain receptor to activation of phospolipase C-beta. Mutations at this locus have been associated with problems in platelet activation and aggregation. A related pseudogene exists on chromosome 2. GNAQ NA
ENSG00000167705 83547 Rab interacting lysosomal protein This gene encodes a lysosomal protein that interacts with RAB7, a small GTPase that controls transport to endocytic degradative compartments. Studies using mutant forms of the two proteins suggest that this protein represents a downstream effector for RAB7, and both proteins act together in the regulation of late endocytic traffic. A unique region of this protein has also been shown to be involved in the regulation of lysosomal morphology. RILP NA
ENSG00000133812 81846 SET binding factor 2 This gene encodes a pseudophosphatase and member of the myotubularin-related protein family. This gene maps within the CMT4B2 candidate region of chromosome 11p15 and mutations in this gene have been associated with Charcot-Marie-Tooth Disease, type 4B2. SBF2 NA
ENSG00000157978 26119 low density lipoprotein receptor adaptor protein 1 The protein encoded by this gene is a cytosolic protein which contains a phosphotyrosine binding (PTD) domain. The PTD domain has been found to interact with the cytoplasmic tail of the LDL receptor. Mutations in this gene lead to LDL receptor malfunction and cause the disorder autosomal recessive hypercholesterolaemia. LDLRAP1 NA
ENSG00000004399 23129 plexin D1 NA PLXND1 NA
ENSG00000261269 ENSG00000261269 NA NA RP11-389C8.2 NA
ENSG00000166033 5654 HtrA serine peptidase 1 This gene encodes a member of the trypsin family of serine proteases. This protein is a secreted enzyme that is proposed to regulate the availability of insulin-like growth factors (IGFs) by cleaving IGF-binding proteins. It has also been suggested to be a regulator of cell growth. Variations in the promoter region of this gene are the cause of susceptibility to age-related macular degeneration type 7. HTRA1 NA
ENSG00000158828 65018 PTEN induced putative kinase 1 This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. PINK1 NA
ENSG00000256845 NA NA NA NA TRUE
ENSG00000114019 51421 angiomotin like 2 Angiomotin is a protein that binds angiostatin, a circulating inhibitor of the formation of new blood vessels (angiogenesis). Angiomotin mediates angiostatin inhibition of endothelial cell migration and tube formation in vitro. The protein encoded by this gene is related to angiomotin and is a member of the motin protein family. Alternative splicing results in multiple transcript variants of this gene. AMOTL2 NA
ENSG00000235033 100505635 uncharacterized LOC100505635 NA LOC100505635 NA
ENSG00000138604 26035 glucuronic acid epimerase NA GLCE NA
ENSG00000090674 57192 mucolipin 1 This gene encodes a memberof the transient receptor potential (TRP) cation channel gene family. The transmembrane protein localizes to intracellular vesicular membranes including lysosomes, and functions in the late endocytic pathway and in the regulation of lysosomal exocytosis. The channel is permeable to Ca(2+), Fe(2+), Na(+), K(+), and H(+), and is modulated by changes in Ca(2+) concentration. Mutations in this gene result in mucolipidosis type IV. MCOLN1 NA
ENSG00000155792 64798 DEP domain containing MTOR-interacting protein NA DEPTOR NA
ENSG00000180891 404093 CUE domain containing 1 NA CUEDC1 NA
ENSG00000048342 57545 coiled-coil and C2 domain containing 2A This gene encodes a coiled-coil and calcium binding domain protein that appears to play a critical role in cilia formation. Mutations in this gene cause Meckel syndrome type 6, as well as Joubert syndrome type 9. Alternative splicing results in multiple transcript variants. CC2D2A NA
ENSG00000109654 23321 tripartite motif containing 2 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic filaments. It plays a neuroprotective role and functions as an E3-ubiquitin ligase in proteasome-mediated degradation of target proteins. Mutations in this gene can cause early-onset axonal neuropathy. Alternative splicing results in multiple transcript variants. TRIM2 NA
ENSG00000215769 146880 Rho GTPase activating protein 27 pseudogene NA LOC146880 NA
ENSG00000106524 57037 ankyrin repeat and MYND domain containing 2 NA ANKMY2 NA
ENSG00000267546 ENSG00000267546 NA NA RP11-666A8.8 NA
ENSG00000273219 ENSG00000273219 NA NA RP11-644N4.1 NA
ENSG00000138463 84925 disrupted in renal carcinoma 2 This gene encodes a membrane-bound protein from the major facilitator superfamily of transporters. Disruption of this gene by translocation has been associated with haplo-insufficiency and renal cell carcinomas. Alternatively spliced transcript variants have been described, but their biological validity has not yet been determined. DIRC2 NA
ENSG00000178573 4094 MAF bZIP transcription factor The protein encoded by this gene is a DNA-binding, leucine zipper-containing transcription factor that acts as a homodimer or as a heterodimer. Depending on the binding site and binding partner, the encoded protein can be a transcriptional activator or repressor. This protein plays a role in the regulation of several cellular processes, including embryonic lens fiber cell development, increased T-cell susceptibility to apoptosis, and chondrocyte terminal differentiation. Defects in this gene are a cause of juvenile-onset pulverulent cataract as well as congenital cerulean cataract 4 (CCA4). Two transcript variants encoding different isoforms have been found for this gene. MAF NA
ENSG00000176834 54621 V-set and immunoglobulin domain containing 10 NA VSIG10 NA
ENSG00000132694 9826 Rho guanine nucleotide exchange factor 11 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The encoded protein may form a complex with G proteins and stimulate Rho-dependent signals. A similar protein in rat interacts with glutamate transporter EAAT4 and modulates its glutamate transport activity. Expression of the rat protein induces the reorganization of the actin cytoskeleton and its overexpression induces the formation of membrane ruffling and filopodia. Two alternative transcripts encoding different isoforms have been described. ARHGEF11 NA
ENSG00000142552 57333 reticulocalbin 3 NA RCN3 NA
ENSG00000084234 334 amyloid beta precursor like protein 2 This gene encodes amyloid precursor- like protein 2 (APLP2), which is a member of the APP (amyloid precursor protein) family including APP, APLP1 and APLP2. This protein is ubiquitously expressed. It contains heparin-, copper- and zinc- binding domains at the N-terminus, BPTI/Kunitz inhibitor and E2 domains in the middle region, and transmembrane and intracellular domains at the C-terminus. This protein interacts with major histocompatibility complex (MHC) class I molecules. The synergy of this protein and the APP is required to mediate neuromuscular transmission, spatial learning and synaptic plasticity. This protein has been implicated in the pathogenesis of Alzheimer’s disease. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. APLP2 NA
ENSG00000160094 149076 zinc finger protein 362 NA ZNF362 NA
ENSG00000116005 51449 prenylcysteine oxidase 1 Prenylcysteine is released during the degradation of prenylated proteins. PCYOX1 catalyzes the degradation of prenylcysteine to yield free cysteines and a hydrophobic isoprenoid product (Tschantz et al., 1999 [PubMed 10585463]). PCYOX1 NA
ENSG00000248019 285512 FAM13A antisense RNA 1 NA FAM13A-AS1 NA
ENSG00000131069 55902 acyl-CoA synthetase short-chain family member 2 This gene encodes a cytosolic enzyme that catalyzes the activation of acetate for use in lipid synthesis and energy generation. The protein acts as a monomer and produces acetyl-CoA from acetate in a reaction that requires ATP. Expression of this gene is regulated by sterol regulatory element-binding proteins, transcription factors that activate genes required for the synthesis of cholesterol and unsaturated fatty acids. Alternative splicing results in multiple transcript variants. ACSS2 NA
ENSG00000166548 7084 thymidine kinase 2, mitochondrial This gene encodes a deoxyribonucleoside kinase that specifically phosphorylates thymidine, deoxycytidine, and deoxyuridine. The encoded enzyme localizes to the mitochondria and is required for mitochondrial DNA synthesis. Mutations in this gene are associated with a myopathic form of mitochondrial DNA depletion syndrome. Alternate splicing results in multiple transcript variants encoding distinct isoforms, some of which lack transit peptide, so are not localized to mitochondria. TK2 NA
ENSG00000054793 10079 ATPase phospholipid transporting 9A (putative) NA ATP9A NA
ENSG00000161912 221442 adenylate cyclase 10 (soluble) pseudogene 1 NA ADCY10P1 NA
ENSG00000125430 9953 heparan sulfate-glucosamine 3-sulfotransferase 3B1 The protein encoded by this gene is a type II integral membrane protein that belongs to the 3-O-sulfotransferases family. These proteins catalyze the addition of sulfate groups at the 3-OH position of glucosamine in heparan sulfate. The substrate specificity of individual members of the family is based on prior modification of the heparan sulfate chain, thus allowing different members of the family to generate binding sites for different proteins on the same heparan sulfate chain. Following treatment with a histone deacetylase inhibitor, expression of this gene is activated in a pancreatic cell line. The increased expression results in promotion of the epithelial-mesenchymal transition. In addition, the modification catalyzed by this protein allows herpes simplex virus membrane fusion and penetration. A very closely related homolog with an almost identical sulfotransferase domain maps less than 1 Mb away. Alternative splicing results in multiple transcript variants. HS3ST3B1 NA
ENSG00000227201 ENSG00000227201 calponin 2 pseudogene 1 NA CNN2P1 NA
ENSG00000261064 ENSG00000261064 NA NA RP11-1000B6.3 NA
ENSG00000124593 29964 prickle planar cell polarity protein 4 C6ORF49 is a member of the LIM domain protein family (Teufel et al., 2005 [PubMed 15702247]). PRICKLE4 NA
ENSG00000200278 ENSG00000200278 RNA, 5S ribosomal pseudogene 352 NA RNA5SP352 NA
ENSG00000167107 80221 acyl-CoA synthetase family member 2 NA ACSF2 NA
ENSG00000230633 NA NA NA NA TRUE
ENSG00000065154 4942 ornithine aminotransferase This gene encodes the mitochondrial enzyme ornithine aminotransferase, which is a key enzyme in the pathway that converts arginine and ornithine into the major excitatory and inhibitory neurotransmitters glutamate and GABA. Mutations that result in a deficiency of this enzyme cause the autosomal recessive eye disease Gyrate Atrophy. Alternatively spliced transcript variants encoding different isoforms have been described. Related pseudogenes have been defined on the X chromosome. OAT NA
ENSG00000143842 9580 SRY-box 13 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. It has also been determined to be a type-1 diabetes autoantigen, also known as islet cell antibody 12. SOX13 NA
ENSG00000135407 10677 advillin The protein encoded by this gene is a member of the gelsolin/villin family of actin regulatory proteins. This protein has structural similarity to villin. It binds actin and may play a role in the development of neuronal cells that form ganglia. AVIL NA
ENSG00000178814 26873 5-oxoprolinase (ATP-hydrolysing) The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). OPLAH NA
ENSG00000042445 54884 retinol saturase NA RETSAT NA
ENSG00000089159 5829 paxillin This gene encodes a cytoskeletal protein involved in actin-membrane attachment at sites of cell adhesion to the extracellular matrix (focal adhesion). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. These isoforms exhibit different expression pattern, and have different biochemical, as well as physiological properties (PMID:9054445). PXN NA
ENSG00000110013 54414 sialic acid acetylesterase This gene encodes an enzyme which removes 9-O-acetylation modifications from sialic acids. Mutations in this gene are associated with susceptibility to autoimmune disease 6. Multiple transcript variants encoding different isoforms, found either in the cytosol or in the lysosome, have been found for this gene. SIAE NA
ENSG00000111897 57515 serine incorporator 1 NA SERINC1 NA
ENSG00000255857 ENSG00000255857 PXN antisense RNA 1 NA PXN-AS1 NA
ENSG00000256142 NA NA NA NA TRUE
ENSG00000183723 146223 CKLF like MARVEL transmembrane domain containing 4 This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and the transmembrane 4 superfamilies of signaling molecules. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 16. Alternatively spliced transcript variants encoding different isoforms have been identified. CMTM4 NA
ENSG00000197872 81553 family with sequence similarity 49 member A NA FAM49A NA
ENSG00000272091 NA NA NA NA TRUE
ENSG00000110931 10645 calcium/calmodulin-dependent protein kinase kinase 2 The product of this gene belongs to the Serine/Threonine protein kinase family, and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. The major isoform of this gene plays a role in the calcium/calmodulin-dependent (CaM) kinase cascade by phosphorylating the downstream kinases CaMK1 and CaMK4. Protein products of this gene also phosphorylate AMP-activated protein kinase (AMPK). This gene has its strongest expression in the brain and influences signalling cascades involved with learning and memory, neuronal differentiation and migration, neurite outgrowth, and synapse formation. Alternative splicing results in multiple transcript variants encoding distinct isoforms. The identified isoforms differ in their ability to undergo autophosphorylation and to phosphorylate downstream kinases. CAMKK2 NA
ENSG00000269976 ENSG00000269976 NA NA RP11-130L8.2 NA
ENSG00000080573 50509 collagen type V alpha 3 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are thought to be responsible for the symptoms of a subset of patients with Ehlers-Danlos syndrome type III. Messages of several sizes can be detected in northern blots but sequence information cannot confirm the identity of the shorter messages. COL5A3 NA
ENSG00000101400 6640 syntrophin alpha 1 Syntrophins are cytoplasmic peripheral membrane scaffold proteins that are components of the dystrophin-associated protein complex. This gene is a member of the syntrophin gene family and encodes the most common syntrophin isoform found in cardiac tissues. The N-terminal PDZ domain of this syntrophin protein interacts with the C-terminus of the pore-forming alpha subunit (SCN5A) of the cardiac sodium channel Nav1.5. This protein also associates cardiac sodium channels with the nitric oxide synthase-PMCA4b (plasma membrane Ca-ATPase subtype 4b) complex in cardiomyocytes. This gene is a susceptibility locus for Long-QT syndrome (LQT) - an inherited disorder associated with sudden cardiac death from arrhythmia - and sudden infant death syndrome (SIDS). This protein also associates with dystrophin and dystrophin-related proteins at the neuromuscular junction and alters intracellular calcium ion levels in muscle tissue. SNTA1 NA
ENSG00000148180 2934 gelsolin The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. GSN NA
ENSG00000211584 55652 solute carrier family 48 member 1 NA SLC48A1 NA
ENSG00000254317 ENSG00000254317 NA NA RP11-473O4.5 NA
ENSG00000237781 ENSG00000237781 NA NA RP11-54A4.2 NA
ENSG00000175662 146691 target of myb1 like 2 membrane trafficking protein NA TOM1L2 NA
ENSG00000168297 54899 PX domain containing serine/threonine kinase like This gene encodes a phox (PX) domain-containing protein which may be involved in synaptic transmission and the ligand-induced internalization and degradation of epidermal growth factors. Variations in this gene may be associated with susceptibility to systemic lupus erythematosus (SLE). Alternative splicing results in multiple transcript variants. PXK NA
ENSG00000165801 55701 Rho guanine nucleotide exchange factor 40 This gene encodes a protein similar to guanosine nucleotide exchange factors for Rho GTPases. The encoded protein contains in its C-terminus a GEF domain involved in exchange activity and a pleckstrin homology domain. Alternatively spliced transcripts that encode different proteins have been described. ARHGEF40 NA
ENSG00000122642 11328 FK506 binding protein 9 NA FKBP9 NA
ENSG00000133059 25778 dual serine/threonine and tyrosine protein kinase This gene encodes a dual serine/threonine and tyrosine protein kinase which is expressed in multiple tissues. It is thought to function as a regulator of cell death. Multiple transcript variants encoding different isoforms have been found for this gene. DSTYK NA
ENSG00000197746 5660 prosaposin This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. PSAP NA
ENSG00000099290 387680 family with sequence similarity 21 member A NA FAM21A NA
ENSG00000260231 100134229 JHDM1D antisense RNA 1 (head to head) NA JHDM1D-AS1 NA
ENSG00000140044 122953 Jun dimerization protein 2 NA JDP2 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name notfound
ENSG00000114854 TNNC1 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 troponin C1, slow skeletal and cardiac type NA
ENSG00000172399 MYOZ2 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. 51778 myozenin 2 NA
ENSG00000101210 EEF1A2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. 1917 eukaryotic translation elongation factor 1 alpha 2 NA
ENSG00000103316 CRYM Crystallins are separated into two classes: taxon-specific and ubiquitous. The former class is also called phylogenetically-restricted crystallins. The latter class constitutes the major proteins of vertebrate eye lens and maintains the transparency and refractive index of the lens. This gene encodes a taxon-specific crystallin protein that binds NADPH and has sequence similarity to bacterial ornithine cyclodeaminases. The encoded protein does not perform a structural role in lens tissue, and instead it binds thyroid hormone for possible regulatory or developmental roles. Mutations in this gene have been associated with autosomal dominant non-syndromic deafness. 1428 crystallin mu NA
ENSG00000057294 PKP2 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. 5318 plakophilin 2 NA
ENSG00000130528 HRC This gene encodes a luminal sarcoplasmic reticulum protein identified by its ability to bind low-density lipoprotein with high affinity. The protein interacts with the cytoplasmic domain of triadin, the main transmembrane protein of the junctional sarcoplasmic reticulum (SR) of skeletal muscle. The protein functions in the regulation of releasable calcium into the SR. 3270 histidine rich calcium binding protein NA
ENSG00000126882 FAM78A NA 286336 family with sequence similarity 78 member A NA
ENSG00000173991 TCAP Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557 titin-cap NA
ENSG00000075702 WDR62 This gene is proposed to play a role in cerebral cortical development. Mutations in this gene have been associated with microencephaly, cortical malformations, and mental retardation. Alternative splicing results in multiple transcript variants. 284403 WD repeat domain 62 NA
ENSG00000107317 PTGDS The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 prostaglandin D2 synthase NA
ENSG00000077522 ACTN2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88 actinin alpha 2 NA
ENSG00000078814 MYH7B The myosin II molecule is a multi-subunit complex consisting of two heavy chains and four light chains. This gene encodes a heavy chain of myosin II, which is a member of the motor-domain superfamily. The heavy chain includes a globular motor domain, which catalyzes ATP hydrolysis and interacts with actin, and a tail domain in which heptad repeat sequences promote dimerization by interacting to form a rod-like alpha-helical coiled coil. This heavy chain subunit is a slow-twitch myosin. Alternatively spliced transcript variants have been found, but the full-length nature of these variants is not determined. 57644 myosin, heavy chain 7B, cardiac muscle, beta NA
ENSG00000159251 ACTC1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 actin, alpha, cardiac muscle 1 NA
ENSG00000247134 RP11-11N9.4 NA ENSG00000247134 NA NA
ENSG00000132821 VSTM2L NA 128434 V-set and transmembrane domain containing 2 like NA
ENSG00000118729 CASQ2 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. 845 calsequestrin 2 NA
ENSG00000122367 LDB3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. 11155 LIM domain binding 3 NA
ENSG00000181856 SLC2A4 This gene is a member of the solute carrier family 2 (facilitated glucose transporter) family and encodes a protein that functions as an insulin-regulated facilitative glucose transporter. In the absence of insulin, this integral membrane protein is sequestered within the cells of muscle and adipose tissue. Within minutes of insulin stimulation, the protein moves to the cell surface and begins to transport glucose across the cell membrane. Mutations in this gene have been associated with noninsulin-dependent diabetes mellitus (NIDDM). 6517 solute carrier family 2 member 4 NA
ENSG00000178343 SHISA3 NA 152573 shisa family member 3 NA
ENSG00000123610 TNFAIP6 The protein encoded by this gene is a secretory protein that contains a hyaluronan-binding domain, and thus is a member of the hyaluronan-binding protein family. The hyaluronan-binding domain is known to be involved in extracellular matrix stability and cell migration. This protein has been shown to form a stable complex with inter-alpha-inhibitor (I alpha I), and thus enhance the serine protease inhibitory activity of I alpha I, which is important in the protease network associated with inflammation. This gene can be induced by proinflammatory cytokines such as tumor necrosis factor alpha and interleukin-1. Enhanced levels of this protein are found in the synovial fluid of patients with osteoarthritis and rheumatoid arthritis. 7130 TNF alpha induced protein 6 NA
ENSG00000160097 FNDC5 This gene encodes a secreted protein that is released from muscle cells during exercise. The encoded protein may participate in the development of brown fat. Translation of the precursor protein initiates at a non-AUG start codon at a position that is conserved as an AUG start codon in other organisms. Alternative splicing results in multiple transcript variants. 252995 fibronectin type III domain containing 5 NA
ENSG00000239775 AC017116.11 NA ENSG00000239775 NA NA
ENSG00000164530 PI16 NA 221476 peptidase inhibitor 16 NA
ENSG00000062524 LTK The protein encoded by this gene is a member of the ros/insulin receptor family of tyrosine kinases. Tyrosine-specific phosphorylation of proteins is a key to the control of diverse pathways leading to cell growth and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. 4058 leukocyte receptor tyrosine kinase NA
ENSG00000134775 FHOD3 The protein encoded by this gene is a member of the diaphanous-related formins (DRF), and contains multiple domains, including GBD (GTPase-binding domain), DID (diaphanous inhibitory domain), FH1 (formin homology 1), FH2 (formin homology 2), and DAD (diaphanous auto-regulatory domain) domains. This protein is thought to play a role in actin filament polymerization in cardiomyocytes. Mutations in this gene have been associated with dilated cardiomyopathy (DCM), characterized by dilation of the ventricular chamber, leading to impairment of systolic pump function and subsequent heart failure. Increased levels of the protein encoded by this gene have been observed in individuals with hypertrophic cardiomyopathy (HCM). Alternative splicing results in multiple transcript variants encoding different isoforms. A muscle-specific isoform has been shown to possess a casein kinase 2 (CK2) phosphorylation site at the C-terminal end of the FH2 domain. Phosphorylation of this site alters its interaction with sequestosome 1 (SQSTM1), and targets this isoform to myofibrils, while other isoforms form cytoplasmic aggregates. 80206 formin homology 2 domain containing 3 NA
ENSG00000064201 TSPAN32 This gene, which is a member of the tetraspanin superfamily, is one of several tumor-suppressing subtransferable fragments located in the imprinted gene domain of chromosome 11p15.5, an important tumor-suppressor gene region. Alterations in this region have been associated with Beckwith-Wiedemann syndrome, Wilms tumor, rhabdomyosarcoma, adrenocortical carcinoma, and lung, ovarian and breast cancers. This gene is located among several imprinted genes; however, this gene, as well as the tumor-suppressing subchromosomal transferable fragment 4, escapes imprinting. This gene may play a role in malignancies and diseases that involve this region, and it is also involved in hematopoietic cell function. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. 10077 tetraspanin 32 NA
ENSG00000114200 BCHE Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. 590 butyrylcholinesterase NA
ENSG00000175445 LPL LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. 4023 lipoprotein lipase NA
ENSG00000181800 CELF2-AS1 NA 414196 CELF2 antisense RNA 1 NA
ENSG00000127863 TNFRSF19 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. 55504 tumor necrosis factor receptor superfamily member 19 NA
ENSG00000178053 MLF1 This gene encodes an oncoprotein which is thought to play a role in the phenotypic determination of hemopoetic cells. Translocations between this gene and nucleophosmin have been associated with myelodysplastic syndrome and acute myeloid leukemia. Multiple transcript variants encoding different isoforms have been found for this gene. 4291 myeloid leukemia factor 1 NA
ENSG00000197299 BLM The Bloom syndrome gene product is related to the RecQ subset of DExH box-containing DNA helicases and has both DNA-stimulated ATPase and ATP-dependent DNA helicase activities. Mutations causing Bloom syndrome delete or alter helicase motifs and may disable the 3’-5’ helicase activity. The normal protein may act to suppress inappropriate recombination. 641 Bloom syndrome RecQ like helicase NA
ENSG00000167723 TRPV3 This gene product belongs to a family of nonselective cation channels that function in a variety of processes, including temperature sensation and vasoregulation. The thermosensitive members of this family are expressed in subsets of sensory neurons that terminate in the skin, and are activated at distinct physiological temperatures. This channel is activated at temperatures between 22 and 40 degrees C. This gene lies in close proximity to another family member gene on chromosome 17, and the two encoded proteins are thought to associate with each other to form heteromeric channels. Multiple transcript variants encoding different isoforms have been found for this gene. 162514 transient receptor potential cation channel subfamily V member 3 NA
ENSG00000128578 STRIP2 NA 57464 striatin interacting protein 2 NA
ENSG00000157570 TSPAN18 NA 90139 tetraspanin 18 NA
ENSG00000145362 ANK2 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. 287 ankyrin 2, neuronal NA
ENSG00000108823 SGCA This gene encodes a component of the dystrophin-glycoprotein complex (DGC), which is critical to the stability of muscle fiber membranes and to the linking of the actin cytoskeleton to the extracellular matrix. Its expression is thought to be restricted to striated muscle. Mutations in this gene result in type 2D autosomal recessive limb-girdle muscular dystrophy. Multiple transcript variants encoding different isoforms have been found for this gene. 6442 sarcoglycan alpha NA
ENSG00000267060 PTGES3L NA 100885848 prostaglandin E synthase 3 (cytosolic)-like NA
ENSG00000020577 SAMD4A Sterile alpha motifs (SAMs) in proteins such as SAMD4A are part of an RNA-binding domain that functions as a posttranscriptional regulator by binding to an RNA sequence motif known as the Smaug recognition element, which was named after the Drosophila Smaug protein (Baez and Boccaccio, 2005 [PubMed 16221671]). 23034 sterile alpha motif domain containing 4A NA
ENSG00000136378 ADAMTS7 The protein encoded by this gene is a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motifs) family. Members of this family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme contains two C-terminal TS motifs and may regulate vascular smooth muscle cell (VSMC) migration. Mutations in this gene may be associated with susceptibility to coronary artery disease. 11173 ADAM metallopeptidase with thrombospondin type 1 motif 7 NA
ENSG00000229164 NA NA NA NA TRUE
ENSG00000137392 CLPS The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208 colipase NA
ENSG00000227242 NBPF13P NA ENSG00000227242 neuroblastoma breakpoint family member 13, pseudogene NA
ENSG00000165071 TMEM71 NA 137835 transmembrane protein 71 NA
ENSG00000050767 COL23A1 COL23A1 is a member of the transmembrane collagens, a subfamily of the nonfibrillar collagens that contain a single pass hydrophobic transmembrane domain (Banyard et al., 2003 [PubMed 12644459]). 91522 collagen type XXIII alpha 1 chain NA
ENSG00000007237 GAS7 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. 8522 growth arrest specific 7 NA
ENSG00000119686 FLVCR2 This gene encodes a member of the major facilitator superfamily. The encoded transmembrane protein is a calcium transporter. Unlike the related protein feline leukemia virus subgroup C receptor 1, the protein encoded by this locus does not bind to feline leukemia virus subgroup C envelope protein. The encoded protein may play a role in development of brain vascular endothelial cells, as mutations at this locus have been associated with proliferative vasculopathy and hydranencephaly-hydrocephaly syndrome. Alternatively spliced transcript variants have been described. 55640 feline leukemia virus subgroup C cellular receptor family member 2 NA
ENSG00000115641 FHL2 This gene encodes a member of the four-and-a-half-LIM-only protein family. Family members contain two highly conserved, tandemly arranged, zinc finger domains with four highly conserved cysteines binding a zinc atom in each zinc finger. This protein is thought to have a role in the assembly of extracellular membranes. Also, this gene is down-regulated during transformation of normal myoblasts to rhabdomyosarcoma cells and the encoded protein may function as a link between presenilin-2 and an intracellular signaling pathway. Multiple alternatively spliced variants encoding different isoforms have been identified. 2274 four and a half LIM domains 2 NA
ENSG00000109099 PMP22 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. 5376 peripheral myelin protein 22 NA
ENSG00000140403 DNAJA4 NA 55466 DnaJ heat shock protein family (Hsp40) member A4 NA
ENSG00000151640 DPYSL4 NA 10570 dihydropyrimidinase like 4 NA
ENSG00000272418 RP11-762H8.4 NA ENSG00000272418 NA NA
ENSG00000141576 RNF157 NA 114804 ring finger protein 157 NA
ENSG00000251196 RP11-54F2.1 NA ENSG00000251196 NA NA
ENSG00000048740 CELF2 Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. 10659 CUGBP, Elav-like family member 2 NA
ENSG00000198523 PLN The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. 5350 phospholamban NA
ENSG00000135540 NHSL1 NA 57224 NHS like 1 NA
ENSG00000122877 EGR2 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. 1959 early growth response 2 NA
ENSG00000152137 HSPB8 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. 26353 heat shock protein family B (small) member 8 NA
ENSG00000171241 SHCBP1 NA 79801 SHC binding and spindle associated 1 NA
ENSG00000163376 KBTBD8 NA 84541 kelch repeat and BTB domain containing 8 NA
ENSG00000225630 MTND2P28 NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA
ENSG00000169508 GPR183 This gene was identified by the up-regulation of its expression upon Epstein-Barr virus infection of primary B lymphocytes. This gene is predicted to encode a G protein-coupled receptor that is most closely related to the thrombin receptor. Expression of this gene was detected in B-lymphocyte cell lines and lymphoid tissues but not in T-lymphocyte cell lines or peripheral blood T lymphocytes. The function of this gene is unknown. 1880 G protein-coupled receptor 183 NA
ENSG00000229732 AC019349.5 NA ENSG00000229732 NA NA
ENSG00000142615 CELA2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 63036 chymotrypsin like elastase family member 2A NA
ENSG00000197614 MFAP5 This gene encodes a 25-kD microfibril-associated glycoprotein which is a component of microfibrils of the extracellular matrix. The encoded protein promotes attachment of cells to microfibrils via alpha-V-beta-3 integrin. Deficiency of this gene in mice results in neutropenia. Alternate splicing results in multiple transcript variants encoding different isoforms. 8076 microfibrillar associated protein 5 NA
ENSG00000114378 HYAL1 This gene encodes a lysosomal hyaluronidase. Hyaluronidases intracellularly degrade hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan is thought to be involved in cell proliferation, migration and differentiation. This enzyme is active at an acidic pH and is the major hyaluronidase in plasma. Mutations in this gene are associated with mucopolysaccharidosis type IX, or hyaluronidase deficiency. The gene is one of several related genes in a region of chromosome 3p21.3 associated with tumor suppression. Multiple transcript variants encoding different isoforms have been found for this gene. 3373 hyaluronoglucosaminidase 1 NA
ENSG00000198771 RCSD1 NA 92241 RCSD domain containing 1 NA
ENSG00000225972 MTND1P23 NA ENSG00000225972 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 NA
ENSG00000160883 HK3 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. 3101 hexokinase 3 NA
ENSG00000182333 LIPF This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. 8513 lipase F, gastric type NA
ENSG00000077585 GPR137B NA 7107 G protein-coupled receptor 137B NA
ENSG00000178814 OPLAH The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). 26873 5-oxoprolinase (ATP-hydrolysing) NA
ENSG00000163710 PCOLCE2 NA 26577 procollagen C-endopeptidase enhancer 2 NA
ENSG00000114948 ADAM23 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. It is reported that inactivation of this gene is associated with tumorigenesis in human cancers. 8745 ADAM metallopeptidase domain 23 NA
ENSG00000120049 KCNIP2 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. 30819 potassium voltage-gated channel interacting protein 2 NA
ENSG00000196951 SCOC-AS1 NA 100129858 SCOC antisense RNA 1 NA
ENSG00000168477 TNXB This gene encodes a member of the tenascin family of extracellular matrix glycoproteins. The tenascins have anti-adhesive effects, as opposed to fibronectin which is adhesive. This protein is thought to function in matrix maturation during wound healing, and its deficiency has been associated with the connective tissue disorder Ehlers-Danlos syndrome. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. It is one of four genes in this cluster which have been duplicated. The duplicated copy of this gene is incomplete and is a pseudogene which is transcribed but does not encode a protein. The structure of this gene is unusual in that it overlaps the CREBL1 and CYP21A2 genes at its 5’ and 3’ ends, respectively. Multiple transcript variants encoding different isoforms have been found for this gene. 7148 tenascin XB NA
ENSG00000172867 KRT2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 keratin 2 NA
ENSG00000113296 THBS4 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. 7060 thrombospondin 4 NA
ENSG00000154721 JAM2 This gene belongs to the immunoglobulin superfamily, and the junctional adhesion molecule (JAM) family. The protein encoded by this gene is a type I membrane protein that is localized at the tight junctions of both epithelial and endothelial cells. It acts as an adhesive ligand for interacting with a variety of immune cell types, and may play a role in lymphocyte homing to secondary lymphoid organs. Alternatively spliced transcript variants have been found for this gene. 58494 junctional adhesion molecule 2 NA
ENSG00000124701 APOBEC2 NA 10930 apolipoprotein B mRNA editing enzyme catalytic subunit 2 NA
ENSG00000143536 CRNN This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 cornulin NA
ENSG00000160539 PLPP7 NA 84814 phospholipid phosphatase 7 (inactive) NA
ENSG00000197852 LOC101928718 NA 101928718 uncharacterized LOC101928718 NA
ENSG00000197852 FAM212B NA 55924 family with sequence similarity 212 member B NA
ENSG00000198756 COLGALT2 NA 23127 collagen beta(1-O)galactosyltransferase 2 NA
ENSG00000156381 ANKRD9 NA 122416 ankyrin repeat domain 9 NA
ENSG00000204794 NA NA NA NA TRUE
ENSG00000171033 PKIA The protein encoded by this gene is a member of the cAMP-dependent protein kinase (PKA) inhibitor family. This protein was demonstrated to interact with and inhibit the activities of both C alpha and C beta catalytic subunits of the PKA. Alternatively spliced transcript variants encoding the same protein have been reported. 5569 protein kinase (cAMP-dependent, catalytic) inhibitor alpha NA
ENSG00000124772 CPNE5 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. 57699 copine 5 NA
ENSG00000106565 TMEM176B NA 28959 transmembrane protein 176B NA
ENSG00000134548 SPX The protein encoded by this gene is a hormone involved in modulation of cardiovascular and renal function. It has also been shown in rats to cause weight loss. Several transcript variants have been found for this gene. 80763 spexin hormone NA
ENSG00000145506 NKD2 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 85409 naked cuticle homolog 2 NA
ENSG00000164309 CMYA5 NA 202333 cardiomyopathy associated 5 NA
ENSG00000089041 P2RX7 The product of this gene belongs to the family of purinoceptors for ATP. This receptor functions as a ligand-gated ion channel and is responsible for ATP-dependent lysis of macrophages through the formation of membrane pores permeable to large molecules. Activation of this nuclear receptor by ATP in the cytoplasm may be a mechanism by which cellular activity can be coupled to changes in gene expression. Multiple alternatively spliced variants have been identified, most of which fit nonsense-mediated decay (NMD) criteria. 5027 purinergic receptor P2X 7 NA
ENSG00000130294 KIF1A The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. 547 kinesin family member 1A NA
ENSG00000059691 GATB NA 5188 glutamyl-tRNA(Gln) amidotransferase, subunit B NA
ENSG00000068976 PYGM This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 phosphorylase, glycogen, muscle NA
ENSG00000197815 RP1-253P7.4 NA ENSG00000197815 NA NA
ENSG00000006327 TNFRSF12A NA 51330 tumor necrosis factor receptor superfamily member 12A NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
chromosome 10 open reading frame 10 11067 The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. C10orf10 ENSG00000165507 NA
small nucleolar RNA host gene 25 ENSG00000266402 NA SNHG25 ENSG00000266402 NA
C-X-C motif chemokine ligand 2 2920 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. CXCL2 ENSG00000081041 NA
nuclear protein 1, transcriptional regulator 26471 NA NUPR1 ENSG00000176046 NA
heat shock protein family D (Hsp60) member 1 pseudogene 1 ENSG00000213430 NA HSPD1P1 ENSG00000213430 NA
ERBB receptor feedback inhibitor 1 54206 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). ERRFI1 ENSG00000116285 NA
metallothionein 1M 4499 This gene encodes a member of the metallothionein superfamily, type 1 family. Metallothioneins have a high content of cysteine residues that bind various heavy metals. These genes are transcriptionally regulated by both heavy metals and glucocorticoids. MT1M ENSG00000205364 NA
NEDD4 binding protein 2-like 1 90634 NA N4BP2L1 ENSG00000139597 NA
NA ENSG00000269926 NA RP11-442H21.2 ENSG00000269926 NA
angiopoietin like 4 51129 This gene encodes a glycosylated, secreted protein containing a C-terminal fibrinogen domain. The encoded protein is induced by peroxisome proliferation activators and functions as a serum hormone that regulates glucose homeostasis, lipid metabolism, and insulin sensitivity. This protein can also act as an apoptosis survival factor for vascular endothelial cells and can prevent metastasis by inhibiting vascular growth and tumor cell invasion. The C-terminal domain may be proteolytically-cleaved from the full-length secreted protein. Decreased expression of this gene has been associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. This gene was previously referred to as ANGPTL2 but has been renamed ANGPTL4. ANGPTL4 ENSG00000167772 NA
cystathionine gamma-lyase 1491 This gene encodes a cytoplasmic enzyme in the trans-sulfuration pathway that converts cystathione derived from methionine into cysteine. Glutathione synthesis in the liver is dependent upon the availability of cysteine. Mutations in this gene cause cystathioninuria. Alternative splicing of this gene results in three transcript variants encoding different isoforms. CTH ENSG00000116761 NA
insulin induced gene 1 3638 Oxysterols regulate cholesterol homeostasis through the liver X receptor (LXR)- and sterol regulatory element-binding protein (SREBP)-mediated signaling pathways. This gene is an insulin-induced gene. It encodes an endoplasmic reticulum (ER) membrane protein that plays a critical role in regulating cholesterol concentrations in cells. This protein binds to the sterol-sensing domains of SREBP cleavage-activating protein (SCAP) and HMG CoA reductase, and is essential for the sterol-mediated trafficking of the two proteins. Alternatively spliced transcript variants encoding distinct isoforms have been observed. INSIG1 ENSG00000186480 NA
DNA damage inducible transcript 4 54541 NA DDIT4 ENSG00000168209 NA
BHLHE40 antisense RNA 1 100507582 NA BHLHE40-AS1 ENSG00000235831 NA
l(3)mbt-like 4 (Drosophila) 91133 NA L3MBTL4 ENSG00000154655 NA
membrane palmitoylated protein 6 51678 Members of the peripheral membrane-associated guanylate kinase (MAGUK) family function in tumor suppression and receptor clustering by forming multiprotein complexes containing distinct sets of transmembrane, cytoskeletal, and cytoplasmic signaling proteins. All MAGUKs contain a PDZ-SH3-GUK core and are divided into 4 subfamilies, DLG-like (see DLG1; MIM 601014), ZO1-like (see TJP1; MIM 601009), p55-like (see MPP1; MIM 305360), and LIN2-like (see CASK; MIM 300172), based on their size and the presence of additional domains. MPP6 is a member of the p55-like MAGUK subfamily (Tseng et al., 2001 [PubMed 11311936]). MPP6 ENSG00000105926 NA
phosphoglycerate dehydrogenase 26227 This gene encodes the enzyme which is involved in the early steps of L-serine synthesis in animal cells. L-serine is required for D-serine and other amino acid synthesis. The enzyme requires NAD/NADH as a cofactor and forms homotetramers for activity. Mutations in this gene have been found in a family with congenital microcephaly, psychomotor retardation and other symptoms. Multiple alternatively spliced transcript variants have been found, however the full-length nature of most are not known. PHGDH ENSG00000092621 NA
small nucleolar RNA, H/ACA box 32 692063 NA SNORA32 ENSG00000206799 NA
regulator of G-protein signaling 1 5996 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. RGS1 ENSG00000090104 NA
NA ENSG00000265474 NA AC010761.9 ENSG00000265474 NA
sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 6695 This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. SPOCK1 ENSG00000152377 NA
pro-apoptotic WT1 regulator 5074 The tumor suppressor WT1 represses and activates transcription. The protein encoded by this gene is a WT1-interacting protein that itself functions as a transcriptional repressor. It contains a putative leucine zipper domain which interacts with the zinc finger DNA binding domain of WT1. This protein is specifically upregulated during apoptosis of prostate cells. PAWR ENSG00000177425 NA
ribosomal protein L35 pseudogene 1 ENSG00000237991 NA RPL35P1 ENSG00000237991 NA
C-X3-C motif chemokine ligand 1 6376 NA CX3CL1 ENSG00000006210 NA
heat shock protein family D (Hsp60) member 1 3329 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. HSPD1 ENSG00000144381 NA
NA ENSG00000265840 NA AC010761.10 ENSG00000265840 NA
STX17 antisense RNA 1 441461 NA STX17-AS1 ENSG00000255145 NA
NA ENSG00000271789 NA RP5-1112D6.7 ENSG00000271789 NA
NOP2/Sun RNA methyltransferase family member 6 221078 NA NSUN6 ENSG00000241058 NA
tetraspanin 12 23554 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. TSPAN12 ENSG00000106025 NA
NA ENSG00000231747 NA AC079922.2 ENSG00000231747 NA
ribosomal protein L9 pseudogene 32 ENSG00000242100 NA RPL9P32 ENSG00000242100 NA
UDP-glucose 6-dehydrogenase 7358 The protein encoded by this gene converts UDP-glucose to UDP-glucuronate and thereby participates in the biosynthesis of glycosaminoglycans such as hyaluronan, chondroitin sulfate, and heparan sulfate. These glycosylated compounds are common components of the extracellular matrix and likely play roles in signal transduction, cell migration, and cancer growth and metastasis. The expression of this gene is up-regulated by transforming growth factor beta and down-regulated by hypoxia. Alternative splicing results in multiple transcript variants. UGDH ENSG00000109814 NA
NA ENSG00000271992 NA RP11-42O15.3 ENSG00000271992 NA
TNF receptor associated factor 4 9618 This gene encodes a member of the TNF receptor associated factor (TRAF) family. TRAF proteins are associated with, and mediate signal transduction from members of the TNF receptor superfamily. The encoded protein has been shown to interact with neurotrophin receptor, p75 (NTR/NTSR1), and negatively regulate NTR induced cell death and NF-kappa B activation. This protein has been found to bind to p47phox, a cytosolic regulatory factor included in a multi-protein complex known as NAD(P)H oxidase. This protein thus, is thought to be involved in the oxidative activation of MAPK8/JNK. Alternatively spliced transcript variants have been observed but the full-length nature of only one has been determined. TRAF4 ENSG00000076604 NA
heat shock protein family E (Hsp10) member 1 3336 This gene encodes a major heat shock protein which functions as a chaperonin. Its structure consists of a heptameric ring which binds to another heat shock protein in order to form a symmetric, functional heterodimer which enhances protein folding in an ATP-dependent manner. This gene and its co-chaperonin, HSPD1, are arranged in a head-to-head orientation on chromosome 2. Naturally occurring read-through transcription occurs between this locus and the neighboring locus MOBKL3. HSPE1 ENSG00000115541 NA
NAD kinase 2, mitochondrial 133686 This gene encodes a mitochondrial kinase that catalyzes the phosphorylation of NAD to yield NADP. Mutations in this gene result in 2,4-dienoyl-CoA reductase deficiency. Alternative splicing results in multiple transcript variants. NADK2 ENSG00000152620 NA
NA NA NA NA ENSG00000261280 TRUE
phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 ENSG00000259657 NA PIGHP1 ENSG00000259657 NA
long intergenic non-protein coding RNA 1473 101927217 NA LINC01473 ENSG00000237877 NA
interleukin 6 3569 This gene encodes a cytokine that functions in inflammation and the maturation of B cells. In addition, the encoded protein has been shown to be an endogenous pyrogen capable of inducing fever in people with autoimmune diseases or infections. The protein is primarily produced at sites of acute and chronic inflammation, where it is secreted into the serum and induces a transcriptional inflammatory response through interleukin 6 receptor, alpha. The functioning of this gene is implicated in a wide variety of inflammation-associated disease states, including suspectibility to diabetes mellitus and systemic juvenile rheumatoid arthritis. Alternative splicing results in multiple transcript variants. IL6 ENSG00000136244 NA
NA ENSG00000242590 NA RP11-54O7.14 ENSG00000242590 NA
deafness, autosomal recessive 59 494513 The protein encoded by this gene is a member of the gasdermin family, a family which is found only in vertebrates. The encoded protein is required for the proper function of auditory pathway neurons. Defects in this gene are a cause of non-syndromic sensorineural deafness autosomal recessive type 59 (DFNB59). DFNB59 ENSG00000204311 NA
NA ENSG00000236255 NA AC009404.2 ENSG00000236255 NA
aldehyde oxidase 1 316 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. AOX1 ENSG00000138356 NA
NA ENSG00000231409 NA RP11-83J16.1 ENSG00000231409 NA
ribosomal protein L35 pseudogene 5 ENSG00000225573 NA RPL35P5 ENSG00000225573 NA
HESX homeobox 1 8820 This gene encodes a conserved homeobox protein that is a transcriptional repressor in the developing forebrain and pituitary gland. Mutations in this gene are associated with septooptic dysplasia, HESX1-related growth hormone deficiency, and combined pituitary hormone deficiency. HESX1 ENSG00000163666 NA
retinoic acid receptor responder 1 5918 This gene was identified as a retinoid acid (RA) receptor-responsive gene. It encodes a type 1 membrane protein. The expression of this gene is upregulated by tazarotene as well as by retinoic acid receptors. The expression of this gene is found to be downregulated in prostate cancer, which is caused by the methylation of its promoter and CpG island. Alternatively spliced transcript variant encoding distinct isoforms have been observed. RARRES1 ENSG00000118849 NA
chromosome 8 open reading frame 4 56892 This gene encodes a small, monomeric, predominantly unstructured protein that functions as a positive regulator of the Wnt/beta-catenin signaling pathway. This protein interacts with a repressor of beta-catenin mediated transcription at nuclear speckles. It is thought to competitively block interactions of the repressor with beta-catenin, resulting in up-regulation of beta-catenin target genes. The encoded protein may also play a role in the NF-kappaB and ERK1/2 signaling pathways. Expression of this gene may play a role in the proliferation of several types of cancer including thyroid cancer, breast cancer and hematological malignancies. C8orf4 ENSG00000176907 NA
notch 4 4855 This gene encodes a member of the NOTCH family of proteins. Members of this Type I transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple different domain types. Notch signaling is an evolutionarily conserved intercellular signaling pathway that regulates interactions between physically adjacent cells through binding of Notch family receptors to their cognate ligands. The encoded preproprotein is proteolytically processed in the trans-Golgi network to generate two polypeptide chains that heterodimerize to form the mature cell-surface receptor. This receptor may play a role in vascular, renal and hepatic development. Mutations in this gene may be associated with schizophrenia. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. NOTCH4 ENSG00000204301 NA
early growth response 2 1959 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. EGR2 ENSG00000122877 NA
NA NA NA NA ENSG00000269165 TRUE
CDC28 protein kinase regulatory subunit 1B 1163 CKS1B protein binds to the catalytic subunit of the cyclin dependent kinases and is essential for their biological function. The CKS1B mRNA is found to be expressed in different patterns through the cell cycle in HeLa cells, which reflects a specialized role for the encoded protein. At least two transcript variants have been identified for this gene, and it appears that only one of them encodes a protein. CKS1B ENSG00000173207 NA
dihydroorotate dehydrogenase (quinone) 1723 The protein encoded by this gene catalyzes the fourth enzymatic step, the ubiquinone-mediated oxidation of dihydroorotate to orotate, in de novo pyrimidine biosynthesis. This protein is a mitochondrial protein located on the outer surface of the inner mitochondrial membrane. DHODH ENSG00000102967 NA
eukaryotic translation initiation factor 4E binding protein 1 1978 This gene encodes one member of a family of translation repressor proteins. The protein directly interacts with eukaryotic translation initiation factor 4E (eIF4E), which is a limiting component of the multisubunit complex that recruits 40S ribosomal subunits to the 5’ end of mRNAs. Interaction of this protein with eIF4E inhibits complex assembly and represses translation. This protein is phosphorylated in response to various signals including UV irradiation and insulin signaling, resulting in its dissociation from eIF4E and activation of mRNA translation. EIF4EBP1 ENSG00000187840 NA
GTP cyclohydrolase 1 2643 This gene encodes a member of the GTP cyclohydrolase family. The encoded protein is the first and rate-limiting enzyme in tetrahydrobiopterin (BH4) biosynthesis, catalyzing the conversion of GTP into 7,8-dihydroneopterin triphosphate. BH4 is an essential cofactor required by aromatic amino acid hydroxylases as well as nitric oxide synthases. Mutations in this gene are associated with malignant hyperphenylalaninemia and dopa-responsive dystonia. Several alternatively spliced transcript variants encoding different isoforms have been described; however, not all variants give rise to a functional enzyme. GCH1 ENSG00000131979 NA
smoothened, frizzled class receptor 6608 The protein encoded by this gene is a G protein-coupled receptor that interacts with the patched protein, a receptor for hedgehog proteins. The encoded protein tranduces signals to other proteins after activation by a hedgehog protein/patched protein complex. SMO ENSG00000128602 NA
methylmalonic aciduria (cobalamin deficiency) cblB type 326625 This gene encodes a protein that catalyzes the final step in the conversion of vitamin B(12) into adenosylcobalamin (AdoCbl), a vitamin B12-containing coenzyme for methylmalonyl-CoA mutase. Mutations in the gene are the cause of vitamin B12-dependent methylmalonic aciduria linked to the cblB complementation group. Alternatively spliced transcript variants have been found. MMAB ENSG00000139428 NA
gap junction protein alpha 4 2701 This gene encodes a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. Mutations in this gene have been associated with atherosclerosis and a higher risk of myocardial infarction. GJA4 ENSG00000187513 NA
ribosomal protein S7 pseudogene 3 ENSG00000231940 NA RPS7P3 ENSG00000231940 NA
NA ENSG00000212664 NA RP11-592N21.1 ENSG00000212664 NA
protein phosphatase 1 regulatory inhibitor subunit 14B 26472 NA PPP1R14B ENSG00000173457 NA
hydroxysteroid 17-beta dehydrogenase 7 51478 HSD17B7 encodes an enzyme that functions both as a 17-beta-hydroxysteroid dehydrogenase (EC 1.1.1.62) in the biosynthesis of sex steroids and as a 3-ketosteroid reductase (EC 1.1.1.270) in the biosynthesis of cholesterol (Marijanovic et al., 2003 [PubMed 12829805]). HSD17B7 ENSG00000132196 NA
arylformamidase 125061 NA AFMID ENSG00000183077 NA
solute carrier family 43 member 1 8501 SLC43A1 belongs to the system L family of plasma membrane carrier proteins that transports large neutral amino acids (Babu et al., 2003 [PubMed 12930836]). SLC43A1 ENSG00000149150 NA
ZFP36 ring finger protein-like 1 677 This gene is a member of the TIS11 family of early response genes, which are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. This gene is well conserved across species and has a promoter that contains motifs seen in other early-response genes. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ZFP36L1 ENSG00000185650 NA
interferon induced transmembrane protein 4 pseudogene ENSG00000235821 NA IFITM4P ENSG00000235821 NA
NA ENSG00000197813 NA CTC-301O7.4 ENSG00000197813 NA
crystallin zeta 1429 Crystallins are separated into two classes: taxon-specific, or enzyme, and ubiquitous. The latter class constitutes the major proteins of vertebrate eye lens and maintains the transparency and refractive index of the lens. The former class is also called phylogenetically-restricted crystallins. This gene encodes a taxon-specific crystallin protein which has NADPH-dependent quinone reductase activity distinct from other known quinone reductases. It lacks alcohol dehydrogenase activity although by similarity it is considered a member of the zinc-containing alcohol dehydrogenase family. Unlike other mammalian species, in humans, lens expression is low. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. One pseudogene is known to exist. CRYZ ENSG00000116791 NA
ribosomal protein L17 pseudogene 50 ENSG00000213700 NA RPL17P50 ENSG00000213700 NA
ribosomal protein S2 pseudogene 48 ENSG00000233380 NA RPS2P48 ENSG00000233380 NA
NA NA NA NA ENSG00000273097 TRUE
topoisomerase (DNA) I, mitochondrial 116447 This gene encodes a mitochondrial DNA topoisomerase that plays a role in the modification of DNA topology. The encoded protein is a type IB topoisomerase and catalyzes the transient breaking and rejoining of DNA to relieve tension and DNA supercoiling generated in the mitochondrial genome during replication and transcription. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. TOP1MT ENSG00000184428 NA
NA NA NA NA ENSG00000269999 TRUE
elongation factor, RNA polymerase II, 2 pseudogene 1 ENSG00000227295 NA ELL2P1 ENSG00000227295 NA
argininosuccinate synthetase 1 pseudogene 2 ENSG00000223922 NA ASS1P2 ENSG00000223922 NA
oligodendrocyte maturation-associated long intergenic non-coding RNA ENSG00000235823 NA OLMALINC ENSG00000235823 NA
NA ENSG00000254676 NA RP11-727A23.4 ENSG00000254676 NA
proliferation-associated 2G4 pseudogene 4 ENSG00000230457 NA PA2G4P4 ENSG00000230457 NA
RNA terminal phosphate cyclase like 1 10171 NA RCL1 ENSG00000120158 NA
heat shock protein family A (Hsp70) member 4 like 22824 The protein encoded by this gene is heat shock inducible and may act as a chaperone. The encoded protein can protect the heat-shocked cell against the harmful effects of aggregated proteins. This gene is highly expressed in leukemia cells and may be a good target for therapeutic intervention. Several transcripts encoding different isoforms have been found for this gene. HSPA4L ENSG00000164070 NA
grainyhead like transcription factor 1 29841 This gene encodes a member of the grainyhead family of transcription factors. The encoded protein can exist as a homodimer or can form heterodimers with sister-of-mammalian grainyhead or brother-of-mammalian grainyhead. This protein functions as a transcription factor during development. GRHL1 ENSG00000134317 NA
NA ENSG00000264924 NA RP11-799B12.2 ENSG00000264924 NA
CDC28 protein kinase regulatory subunit 1B pseudogene 3 ENSG00000268942 NA CKS1BP3 ENSG00000268942 NA
heat shock protein family E (Hsp10) member 1 pseudogene 2 ENSG00000258645 NA HSPE1P2 ENSG00000258645 NA
SIX homeobox 5 147912 The protein encoded by this gene is a homeodomain-containing transcription factor that appears to function in the regulation of organogenesis. This gene is located downstream of the dystrophia myotonica-protein kinase gene. Mutations in this gene are a cause of branchiootorenal syndrome type 2. SIX5 ENSG00000177045 NA
growth arrest specific 2 2620 The protein encoded by this gene is a caspase-3 substrate that plays a role in regulating microfilament and cell shape changes during apoptosis. It can also modulate cell susceptibility to p53-dependent apoptosis by inhibiting calpain activity. Multiple alternatively spliced variants, encoding the same protein, have been identified. GAS2 ENSG00000148935 NA
NA ENSG00000255968 NA RP11-513G19.1 ENSG00000255968 NA
NA ENSG00000264281 NA CTD-2031P19.4 ENSG00000264281 NA
NA ENSG00000265194 NA RP11-70L8.4 ENSG00000265194 NA
O-6-methylguanine-DNA methyltransferase 4255 Alkylating agents are potent carcinogens that can result in cell death, mutation and cancer. The protein encoded by this gene is a DNA repair protein that is involved in cellular defense against mutagenesis and toxicity from alkylating agents. The protein catalyzes transfer of methyl groups from O(6)-alkylguanine and other methylated moieties of the DNA to its own molecule, which repairs the toxic lesions. Methylation of the genes promoter has been associated with several cancer types, including colorectal cancer, lung cancer, lymphoma and glioblastoma. MGMT ENSG00000170430 NA
glutaminase 2 27165 The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms. GLS2 ENSG00000135423 NA
tetratricopeptide repeat domain 39C 125488 NA TTC39C ENSG00000168234 NA
phosphodiesterase 8A 5151 The protein encoded by this gene belongs to the cyclic nucleotide phosphodiesterase (PDE) family, and PDE8 subfamily. This PDE hydrolyzes the second messenger, cAMP, which is a regulator and mediator of a number of cellular responses to extracellular signals. Thus, by regulating the cellular concentration of cAMP, this protein plays a key role in many important physiological processes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. PDE8A ENSG00000073417 NA
nuclear factor I B 4781 NA NFIB ENSG00000147862 NA
microtubule associated monooxygenase, calponin and LIM domain containing 2 9645 NA MICAL2 ENSG00000133816 NA
serine/threonine kinase 3 6788 This gene encodes a serine/threonine protein kinase activated by proapoptotic molecules indicating the encoded protein functions as a growth suppressor. Cleavage of the protein product by caspase removes the inhibitory C-terminal portion. The N-terminal portion is transported to the nucleus where it homodimerizes to form the active kinase which promotes the condensation of chromatin during apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. STK3 ENSG00000104375 NA
guanidinoacetate N-methyltransferase 2593 The protein encoded by this gene is a methyltransferase that converts guanidoacetate to creatine, using S-adenosylmethionine as the methyl donor. Defects in this gene have been implicated in neurologic syndromes and muscular hypotonia, probably due to creatine deficiency and accumulation of guanidinoacetate in the brain of affected individuals. Two transcript variants encoding different isoforms have been described for this gene. Pseudogenes of this gene are found on chromosomes 2 and 13. GAMT ENSG00000130005 NA
protein tyrosine phosphatase type IVA, member 1 7803 This gene encodes a member of a small class of prenylated protein tyrosine phosphatases (PTPs), which contain a PTP domain and a characteristic C-terminal prenylation motif. The encoded protein is a cell signaling molecule that plays regulatory roles in a variety of cellular processes, including cell proliferation and migration. The protein may also be involved in cancer development and metastasis. This tyrosine phosphatase is a nuclear protein, but may associate with plasma membrane by means of its prenylation motif. Pseudogenes related to this gene are located on chromosomes 1, 2, 5, 7, 11 and X. PTP4A1 ENSG00000112245 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
5105 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. phosphoenolpyruvate carboxykinase 1 PCK1 ENSG00000124253 NA
3699 This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. inter-alpha-trypsin inhibitor heavy chain 3 ITIH3 ENSG00000162267 NA
3263 This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. hemopexin HPX ENSG00000110169 NA
7018 This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. transferrin TF ENSG00000091513 NA
1373 The mitochondrial enzyme encoded by this gene catalyzes synthesis of carbamoyl phosphate from ammonia and bicarbonate. This reaction is the first committed step of the urea cycle, which is important in the removal of excess urea from cells. The encoded protein may also represent a core mitochondrial nucleoid protein. Three transcript variants encoding different isoforms have been found for this gene. The shortest isoform may not be localized to the mitochondrion. Mutations in this gene have been associated with carbamoyl phosphate synthetase deficiency, susceptibility to persistent pulmonary hypertension, and susceptibility to venoocclusive disease after bone marrow transplantation. carbamoyl-phosphate synthase 1 CPS1 ENSG00000021826 NA
9547 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. C-X-C motif chemokine ligand 14 CXCL14 ENSG00000145824 NA
104326055 NA APOA1 antisense RNA APOA1-AS ENSG00000235910 NA
341 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. apolipoprotein C1 APOC1 ENSG00000130208 NA
2244 The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. fibrinogen beta chain FGB ENSG00000171564 NA
6580 Polyspecific organic cation transporters in the liver, kidney, intestine, and other organs are critical for elimination of many endogenous small organic cations as well as a wide array of drugs and environmental toxins. This gene is one of three similar cation transporter genes located in a cluster on chromosome 6. The encoded protein contains twelve putative transmembrane domains and is a plasma integral membrane protein. Two transcript variants encoding two different isoforms have been found for this gene, but only the longer variant encodes a functional transporter. solute carrier family 22 member 1 SLC22A1 ENSG00000175003 NA
1356 The protein encoded by this gene is a metalloprotein that binds most of the copper in plasma and is involved in the peroxidation of Fe(II)transferrin to Fe(III) transferrin. Mutations in this gene cause aceruloplasminemia, which results in iron accumulation and tissue damage, and is associated with diabetes and neurologic abnormalities. Two transcript variants, one protein-coding and the other not protein-coding, have been found for this gene. ceruloplasmin (ferroxidase) CP ENSG00000047457 NA
733 The protein encoded by this gene belongs to the lipocalin family. It is one of the three subunits that constitutes complement component 8 (C8), which is composed of a disulfide-linked C8 alpha-gamma heterodimer and a non-covalently associated C8 beta chain. C8 participates in the formation of the membrane attack complex (MAC) on bacterial cell membranes. While subunits alpha and beta play a role in complement-mediated bacterial killing, the gamma subunit is not required for the bactericidal activity. complement component 8, gamma polypeptide C8G ENSG00000176919 NA
84959 This gene encodes a protein that contains a ubiquitin associated domain at the N-terminus, an SH3 domain, and a C-terminal domain with similarities to the catalytic motif of phosphoglycerate mutase. The encoded protein was found to inhibit endocytosis of epidermal growth factor receptor (EGFR) and platelet-derived growth factor receptor. ubiquitin associated and SH3 domain containing B UBASH3B ENSG00000154127 NA
729238 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. surfactant protein A2 SFTPA2 ENSG00000185303 NA
3242 The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. 4-hydroxyphenylpyruvate dioxygenase HPD ENSG00000158104 NA
23108 This gene encodes a GTPase-activating protein that activates the small guanine-nucleotide-binding protein Rap1 in platelets. The protein interacts with synaptotagmin-like protein 1 and Rab27 and regulates secretion of dense granules from platelets at sites of endothelial damage. Multiple transcript variants encoding different isoforms have been found for this gene. RAP1 GTPase activating protein 2 RAP1GAP2 ENSG00000132359 NA
6695 This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 SPOCK1 ENSG00000152377 NA
5413 This gene is a member of the septin gene family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is mapped to 22q11, the region frequently deleted in DiGeorge and velocardiofacial syndromes. A translocation involving the MLL gene and this gene has also been reported in patients with acute myeloid leukemia. Alternative splicing results in multiple transcript variants. The presence of a non-consensus polyA signal (AACAAT) in this gene also results in read-through transcription into the downstream neighboring gene (GP1BB; platelet glycoprotein Ib), whereby larger, non-coding transcripts are produced. septin 5 SEPT5 ENSG00000184702 NA
90139 NA tetraspanin 18 TSPAN18 ENSG00000157570 NA
8608 NA retinol dehydrogenase 16 (all-trans) RDH16 ENSG00000139547 NA
3240 This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. haptoglobin HP ENSG00000257017 NA
9645 NA microtubule associated monooxygenase, calponin and LIM domain containing 2 MICAL2 ENSG00000133816 NA
5590 Protein kinase C (PKC) zeta is a member of the PKC family of serine/threonine kinases which are involved in a variety of cellular processes such as proliferation, differentiation and secretion. Unlike the classical PKC isoenzymes which are calcium-dependent, PKC zeta exhibits a kinase activity which is independent of calcium and diacylglycerol but not of phosphatidylserine. Furthermore, it is insensitive to typical PKC inhibitors and cannot be activated by phorbol ester. Unlike the classical PKC isoenzymes, it has only a single zinc finger module. These structural and biochemical properties indicate that the zeta subspecies is related to, but distinct from other isoenzymes of PKC. Alternative splicing results in multiple transcript variants encoding different isoforms. protein kinase C zeta PRKCZ ENSG00000067606 NA
ENSG00000269934 NA NA RP5-1139B12.3 ENSG00000269934 NA
9806 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 SPOCK2 ENSG00000107742 NA
1 The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. alpha-1-B glycoprotein A1BG ENSG00000121410 NA
11123 NA RCAN family member 3 RCAN3 ENSG00000117602 NA
ENSG00000214425 NA leucine-rich repeat containing 37 member A4, pseudogene LRRC37A4P ENSG00000214425 NA
125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide ADH1B ENSG00000196616 NA
10841 The protein encoded by this gene is a bifunctional enzyme that channels 1-carbon units from formiminoglutamate, a metabolite of the histidine degradation pathway, to the folate pool. Mutations in this gene are associated with glutamate formiminotransferase deficiency. Alternatively spliced transcript variants have been found for this gene. formimidoyltransferase cyclodeaminase FTCD ENSG00000160282 NA
55908 NA angiopoietin like 8 ANGPTL8 ENSG00000130173 NA
221895 This gene encodes a nuclear protein with three C2H2-type zinc fingers, and functions as a transcriptional repressor. Chromosomal aberrations involving this gene are associated with endometrial stromal tumors. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized JAZF zinc finger 1 JAZF1 ENSG00000153814 NA
9076 Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. claudin 1 CLDN1 ENSG00000163347 NA
23762 The protein encoded by this gene contains a pleckstrin homology (PH) domain and an oxysterol-binding region. It binds oxysterols such as 7-ketocholesterol and may inhibit their cytotoxicity. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. oxysterol binding protein 2 OSBP2 ENSG00000184792 NA
9751 Syntaxin-1, synaptobrevin/VAMP, and SNAP25 interact to form the SNARE complex, which is required for synaptic vesicle docking and fusion. The protein encoded by this gene is membrane-associated and inhibits SNARE complex formation by binding free syntaxin-1. Expression of this gene appears to be brain-specific. Alternative splicing results in multiple transcript variants encoding different isoforms. syntaphilin SNPH ENSG00000101298 NA
79695 This gene encodes a member of a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases, which catalyze the transfer of N-acetylgalactosamine (GalNAc) from UDP-GalNAc to a serine or threonine residue on a polypeptide acceptor in the initial step of O-linked protein glycosylation. Mutations in this gene are associated with an increased susceptibility to colorectal cancer. polypeptide N-acetylgalactosaminyltransferase 12 GALNT12 ENSG00000119514 NA
51560 NA RAB6B, member RAS oncogene family RAB6B ENSG00000154917 NA
3480 This receptor binds insulin-like growth factor with a high affinity. It has tyrosine kinase activity. The insulin-like growth factor I receptor plays a critical role in transformation events. Cleavage of the precursor generates alpha and beta subunits. It is highly overexpressed in most malignant tissues where it functions as an anti-apoptotic agent by enhancing cell survival. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. insulin like growth factor 1 receptor IGF1R ENSG00000140443 NA
9764 NA KIAA0513 KIAA0513 ENSG00000135709 NA
ENSG00000189316 NA NA RP11-797H7.5 ENSG00000189316 NA
ENSG00000215861 NA NA WI2-1896O14.1 ENSG00000215861 NA
221421 This gene encodes a protein thought to be a component of the radial spoke head in motile cilia and flagella. Mutations in this gene are associated with primary ciliary dyskinesia 12. Alternative splicing results in multiple transcript variants. radial spoke head 9 homolog RSPH9 ENSG00000172426 NA
8659 This protein belongs to the aldehyde dehydrogenase family of proteins. This enzyme is a mitochondrial matrix NAD-dependent dehydrogenase which catalyzes the second step of the proline degradation pathway, converting pyrroline-5-carboxylate to glutamate. Deficiency of this enzyme is associated with type II hyperprolinemia, an autosomal recessive disorder characterized by accumulation of delta-1-pyrroline-5-carboxylate (P5C) and proline. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. aldehyde dehydrogenase 4 family member A1 ALDH4A1 ENSG00000159423 NA
161145 NA transmembrane protein 229B TMEM229B ENSG00000198133 NA
27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ankyrin repeat domain 1 ANKRD1 ENSG00000148677 NA
27124 NA inositol polyphosphate-5-phosphatase J INPP5J ENSG00000185133 NA
9911 NA transmembrane and coiled-coil domain family 2 TMCC2 ENSG00000133069 NA
1488 This gene produces alternative transcripts encoding two distinct proteins. One protein is a transcriptional repressor, while the other isoform is a major component of specialized synapses known as synaptic ribbons. Both proteins contain a NAD+ binding domain similar to NAD+-dependent 2-hydroxyacid dehydrogenases. A portion of the 3’ untranslated region was used to map this gene to chromosome 21q21.3; however, it was noted that similar loci elsewhere in the genome are likely. Blast analysis shows that this gene is present on chromosome 10. Several transcript variants encoding two different isoforms have been found for this gene. C-terminal binding protein 2 CTBP2 ENSG00000175029 NA
2706 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. gap junction protein beta 2 GJB2 ENSG00000165474 NA
3797 NA kinesin family member 3C KIF3C ENSG00000084731 NA
ENSG00000254680 NA NA RP11-265D17.2 ENSG00000254680 NA
57863 IGSF4B is a brain-specific protein related to the calcium-independent cell-cell adhesion molecules known as nectins (see PVRL3; MIM 607147) (Kakunaga et al., 2005 [PubMed 15741237]). cell adhesion molecule 3 CADM3 ENSG00000162706 NA
ENSG00000268230 NA NA CTD-2619J13.8 ENSG00000268230 NA
ENSG00000261172 NA NA RP11-356C4.5 ENSG00000261172 NA
335 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. apolipoprotein A1 APOA1 ENSG00000118137 NA
6252 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. reticulon 1 RTN1 ENSG00000139970 NA
1795 This gene is specifically expressed in the central nervous system (CNS). It encodes a member of the DOCK (dedicator of cytokinesis) family of guanine nucleotide exchange factors (GEFs). This protein, dedicator of cytokinesis 3 (DOCK3), is also known as modifier of cell adhesion (MOCA) and presenilin-binding protein (PBP). The DOCK3 and DOCK1, -2 and -4 share several conserved amino acids in their DHR-2 (DOCK homology region 2) domains that are required for GEF activity, and bind directly to WAVE proteins [Wiskott-Aldrich syndrome protein (WASP) family Verprolin-homologous proteins] via their DHR-1 domains. The DOCK3 induces axonal outgrowth in CNS by stimulating membrane recruitment of the WAVE complex and activating the small G protein Rac1. This gene is associated with an attention deficit hyperactivity disorder-like phenotype by a complex chromosomal rearrangement. dedicator of cytokinesis 3 DOCK3 ENSG00000088538 NA
79827 This gene encodes a type I transmembrane protein that is localized to junctional complexes between endothelial and epithelial cells and may have a role in cell-cell adhesion. Expression of this gene in white adipose tissue is implicated in adipocyte maturation and development of obesity. This gene is also essential for normal intestinal development and mutations in the gene are associated with congenital short bowel syndrome. CXADR-like membrane protein CLMP ENSG00000166250 NA
5452 The protein encoded by this gene is a homeobox-containing transcription factor of the POU domain family. The encoded protein binds the octamer sequence 5’-ATTTGCAT-3’, a common transcription factor binding site in immunoglobulin gene promoters. Several transcript variants encoding different isoforms have been found for this gene. POU class 2 homeobox 2 POU2F2 ENSG00000028277 NA
ENSG00000232320 NA NA AC009299.5 ENSG00000232320 NA
100874235 NA CACNA1C antisense RNA 2 CACNA1C-AS2 ENSG00000256271 NA
2043 This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. Multiple transcript variants encoding different isoforms have been found for this gene. EPH receptor A4 EPHA4 ENSG00000116106 NA
143098 The protein encoded by this gene is a member of the p55 Stardust family of membrane-associated guanylate kinase (MAGUK) proteins, which function in the establishment of epithelial cell polarity. This family member forms a complex with the polarity protein DLG1 (discs, large homolog 1) and facilitates epithelial cell polarity and tight junction formation. Polymorphisms in this gene are associated with variations in site-specific bone mineral density (BMD). Alternative splicing results in multiple transcript variants. membrane palmitoylated protein 7 MPP7 ENSG00000150054 NA
23127 NA collagen beta(1-O)galactosyltransferase 2 COLGALT2 ENSG00000198756 NA
23467 This gene encodes a protein similar to the rat neuronal pentraxin receptor. The rat pentraxin receptor is an integral membrane protein that is thought to mediate neuronal uptake of the snake venom toxin, taipoxin, and its transport into the synapses. Studies in rat indicate that translation of this mRNA initiates at a non-AUG (CUG) codon. This may also be true for mouse and human, based on strong sequence conservation amongst these species. neuronal pentraxin receptor NPTXR ENSG00000221890 NA
9454 This gene encodes a member of the HOMER family of postsynaptic density scaffolding proteins that share a similar domain structure consisting of an N-terminal Enabled/vasodilator-stimulated phosphoprotein homology 1 domain which mediates protein-protein interactions, and a carboxy-terminal coiled-coil domain and two leucine zipper motifs that are involved in self-oligomerization. The encoded protein binds numerous other proteins including group I metabotropic glutamate receptors, inositol 1,4,5-trisphosphate receptors and amyloid precursor proteins and has been implicated in diverse biological functions such as neuronal signaling, T-cell activation and trafficking of amyloid beta peptides. Alternative splicing results in multiple transcript variants. homer scaffolding protein 3 HOMER3 ENSG00000051128 NA
6809 The gene is a member of the syntaxin family. The encoded protein is targeted to the apical membrane of epithelial cells where it forms clusters and is important in establishing and maintaining polarity necessary for protein trafficking involving vesicle fusion and exocytosis. Alternative splicing results in multiple transcript variants. syntaxin 3 STX3 ENSG00000166900 NA
84033 The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF OBSCN ENSG00000154358 NA
116844 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). leucine rich alpha-2-glycoprotein 1 LRG1 ENSG00000171236 NA
57214 NA cell migration inducing hyaluronan binding protein CEMIP ENSG00000103888 NA
3098 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes a ubiquitous form of hexokinase which localizes to the outer membrane of mitochondria. Mutations in this gene have been associated with hemolytic anemia due to hexokinase deficiency. Alternative splicing of this gene results in several transcript variants which encode different isoforms, some of which are tissue-specific. hexokinase 1 HK1 ENSG00000156515 NA
ENSG00000227227 NA NA AC017101.10 ENSG00000227227 NA
NA NA NA NA ENSG00000270172 TRUE
ENSG00000252464 NA RNA, 7SK small nuclear pseudogene 70 RN7SKP70 ENSG00000252464 NA
115572 NA family with sequence similarity 46 member B FAM46B ENSG00000158246 NA
306 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions in the inhibition of phopholipase A2 and cleavage of inositol 1,2-cyclic phosphate to form inositol 1-phosphate. This protein may also play a role in anti-coagulation. annexin A3 ANXA3 ENSG00000138772 NA
6583 Polyspecific organic cation transporters in the liver, kidney, intestine, and other organs are critical for elimination of many endogenous small organic cations as well as a wide array of drugs and environmental toxins. The encoded protein is an organic cation transporter and plasma integral membrane protein containing eleven putative transmembrane domains as well as a nucleotide-binding site motif. Transport by this protein is at least partially ATP-dependent. solute carrier family 22 member 4 SLC22A4 ENSG00000197208 NA
9435 This locus encodes a sulfotransferase protein. The encoded enzyme catalyzes the sulfation of a nonreducing N-acetylglucosamine residue, and may play a role in biosynthesis of 6-sulfosialyl Lewis X antigen. carbohydrate sulfotransferase 2 CHST2 ENSG00000175040 NA
6785 This gene encodes a membrane-bound protein which is a member of the ELO family, proteins which participate in the biosynthesis of fatty acids. Consistent with the expression of the encoded protein in photoreceptor cells of the retina, mutations and small deletions in this gene are associated with Stargardt-like macular dystrophy (STGD3) and autosomal dominant Stargardt-like macular dystrophy (ADMD), also referred to as autosomal dominant atrophic macular degeneration. ELOVL fatty acid elongase 4 ELOVL4 ENSG00000118402 NA
3768 This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. potassium voltage-gated channel subfamily J member 12 KCNJ12 ENSG00000184185 NA
273 This gene encodes a protein associated with the cytoplasmic surface of synaptic vesicles. A subset of patients with stiff-man syndrome who were also affected by breast cancer are positive for autoantibodies against this protein. Alternate splicing of this gene results in two transcript variants encoding different isoforms. Additional splice variants have been described, but their full length sequences have not been determined. A pseudogene of this gene is found on chromosome 11. amphiphysin AMPH ENSG00000078053 NA
84940 NA coronin 6 CORO6 ENSG00000167549 NA
1462 This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. versican VCAN ENSG00000038427 NA
91624 This gene encodes a filamentous actin-binding protein that may function in cell adhesion and migration. Mutations in this gene have been associated with dilated cardiomyopathy, also known as CMD1CC. Alternatively spliced transcript variants have been described. nexilin F-actin binding protein NEXN ENSG00000162614 NA
NA NA NA NA ENSG00000229874 TRUE
ENSG00000239775 NA NA AC017116.11 ENSG00000239775 NA
158471 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. prune homolog 2 PRUNE2 ENSG00000106772 NA
55365 NA transmembrane protein 176A TMEM176A ENSG00000002933 NA
10570 NA dihydropyrimidinase like 4 DPYSL4 ENSG00000151640 NA
22859 This gene encodes a member of the latrophilin subfamily of G-protein coupled receptors (GPCR). Latrophilins may function in both cell adhesion and signal transduction. In experiments with non-human species, endogenous proteolytic cleavage within a cysteine-rich GPS (G-protein-coupled-receptor proteolysis site) domain resulted in two subunits (a large extracellular N-terminal cell adhesion subunit and a subunit with substantial similarity to the secretin/calcitonin family of GPCRs) being non-covalently bound at the cell membrane. Latrophilin-1 has been shown to recruit the neurotoxin from black widow spider venom, alpha-latrotoxin, to the synapse plasma membrane. Alternative splicing results in multiple variants encoding distinct isoforms. adhesion G protein-coupled receptor L1 ADGRL1 ENSG00000072071 NA
57210 NA solute carrier family 45 member 4 SLC45A4 ENSG00000022567 NA
1636 This gene encodes an enzyme involved in catalyzing the conversion of angiotensin I into a physiologically active peptide angiotensin II. Angiotensin II is a potent vasopressor and aldosterone-stimulating peptide that controls blood pressure and fluid-electrolyte balance. This enzyme plays a key role in the renin-angiotensin system. Many studies have associated the presence or absence of a 287 bp Alu repeat element in this gene with the levels of circulating enzyme or cardiovascular pathophysiologies. Multiple alternatively spliced transcript variants encoding different isoforms have been identified, and two most abundant spliced variants encode the somatic form and the testicular form, respectively, that are equally active. angiotensin I converting enzyme ACE ENSG00000159640 NA
6272 This gene encodes a member of the VPS10-related sortilin family of proteins. The encoded preproprotein is proteolytically processed by furin to generate the mature receptor. This receptor plays a role in the trafficking of different proteins to either the cell surface, or subcellular compartments such as lysosomes and endosomes. Expression levels of this gene may influence the risk of myocardial infarction in human patients. Alternative splicing results in multiple transcript variants. sortilin 1 SORT1 ENSG00000134243 NA
64753 NA coiled-coil domain containing 136 CCDC136 ENSG00000128596 NA
26086 G-protein signaling modulators (GPSMs) play diverse functional roles through their interaction with G-protein subunits. This gene encodes a receptor-independent activator of G protein signaling, which is one of several factors that influence the basal activity of G-protein signaling systems. The protein contains seven tetratricopeptide repeats in its N-terminal half and four G-protein regulatory (GPR) motifs in its C-terminal half. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. G-protein signaling modulator 1 GPSM1 ENSG00000160360 NA
641 The Bloom syndrome gene product is related to the RecQ subset of DExH box-containing DNA helicases and has both DNA-stimulated ATPase and ATP-dependent DNA helicase activities. Mutations causing Bloom syndrome delete or alter helicase motifs and may disable the 3’-5’ helicase activity. The normal protein may act to suppress inappropriate recombination. Bloom syndrome RecQ like helicase BLM ENSG00000197299 NA
157310 The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]). phosphatidylethanolamine binding protein 4 PEBP4 ENSG00000134020 NA
1565 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is known to metabolize as many as 25% of commonly prescribed drugs. Its substrates include antidepressants, antipsychotics, analgesics and antitussives, beta adrenergic blocking agents, antiarrythmics and antiemetics. The gene is highly polymorphic in the human population; certain alleles result in the poor metabolizer phenotype, characterized by a decreased ability to metabolize the enzyme’s substrates. Some individuals with the poor metabolizer phenotype have no functional protein since they carry 2 null alleles whereas in other individuals the gene is absent. This gene can vary in copy number and individuals with the ultrarapid metabolizer phenotype can have 3 or more active copies of the gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. cytochrome P450 family 2 subfamily D member 6 CYP2D6 ENSG00000100197 NA
4130 This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. microtubule associated protein 1A MAP1A ENSG00000166963 NA
5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. serpin family A member 1 SERPINA1 ENSG00000197249 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol X_id summary notfound
heat shock protein family A (Hsp70) member 6 ENSG00000173110 HSPA6 3310 NA NA
hypoxia inducible lipid droplet associated ENSG00000135245 HILPDA 29923 NA NA
NA ENSG00000240758 RP11-155G14.6 ENSG00000240758 NA NA
prostaglandin-endoperoxide synthase 2 ENSG00000073756 PTGS2 5743 Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is the key enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2, which differ in their regulation of expression and tissue distribution. This gene encodes the inducible isozyme. It is regulated by specific stimulatory events, suggesting that it is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis. NA
zinc finger CCCH-type containing 12A ENSG00000163874 ZC3H12A 80149 ZC3H12A is an MCP1 (CCL2; MIM 158105)-induced protein that acts as a transcriptional activator and causes cell death of cardiomyocytes, possibly via induction of genes associated with apoptosis. NA
keratin 8 pseudogene 50 ENSG00000260799 KRT8P50 ENSG00000260799 NA NA
heat shock protein family A (Hsp70) member 1B ENSG00000204388 HSPA1B 3304 This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. NA
suppressor of cytokine signaling 3 ENSG00000184557 SOCS3 9021 This gene encodes a member of the STAT-induced STAT inhibitor (SSI), also known as suppressor of cytokine signaling (SOCS), family. SSI family members are cytokine-inducible negative regulators of cytokine signaling. The expression of this gene is induced by various cytokines, including IL6, IL10, and interferon (IFN)-gamma. The protein encoded by this gene can bind to JAK2 kinase, and inhibit the activity of JAK2 kinase. Studies of the mouse counterpart of this gene suggested the roles of this gene in the negative regulation of fetal liver hematopoiesis, and placental development. NA
regulator of G-protein signaling 2 ENSG00000116741 RGS2 5997 Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. NA
immediate early response 3 ENSG00000137331 IER3 8870 This gene functions in the protection of cells from Fas- or tumor necrosis factor type alpha-induced apoptosis. Partially degraded and unspliced transcripts are found after virus infection in vitro, but these transcripts are not found in vivo and do not generate a valid protein. NA
SERPINE1 mRNA binding protein 1 pseudogene 3 ENSG00000242142 SERBP1P3 ENSG00000242142 NA NA
AT-rich interaction domain 5A ENSG00000196843 ARID5A 10865 Members of the ARID protein family, including ARID5A, have diverse functions but all appear to play important roles in development, tissue-specific gene expression, and regulation of cell growth (Patsialou et al., 2005 [PubMed 15640446]). NA
cholinergic receptor nicotinic epsilon subunit ENSG00000108556 CHRNE 1145 Acetylcholine receptors at mature mammalian neuromuscular junctions are pentameric protein complexes composed of four subunits in the ratio of two alpha subunits to one beta, one epsilon, and one delta subunit. The acetylcholine receptor changes subunit composition shortly after birth when the epsilon subunit replaces the gamma subunit seen in embryonic receptors. Mutations in the epsilon subunit are associated with congenital myasthenic syndrome. NA
B-cell CLL/lymphoma 3 ENSG00000069399 BCL3 602 This gene is a proto-oncogene candidate. It is identified by its translocation into the immunoglobulin alpha-locus in some cases of B-cell leukemia. The protein encoded by this gene contains seven ankyrin repeats, which are most closely related to those found in I kappa B proteins. This protein functions as a transcriptional co-activator that activates through its association with NF-kappa B homodimers. The expression of this gene can be induced by NF-kappa B, which forms a part of the autoregulatory loop that controls the nuclear residence of p50 NF-kappa B. NA
FOS like 1, AP-1 transcription factor subunit ENSG00000175592 FOSL1 8061 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. NA
cysteine and serine rich nuclear protein 1 ENSG00000144655 CSRNP1 64651 This gene encodes a protein that localizes to the nucleus and expression of this gene is induced in response to elevated levels of axin. The Wnt signalling pathway, which is negatively regulated by axin, is important in axis formation in early development and impaired regulation of this signalling pathway is often involved in tumors. A decreased level of expression of this gene in tumors compared to the level of expression in their corresponding normal tissues suggests that this gene product has a tumor suppressor function. Alternative splicing results in multiple transcript variants. NA
uncharacterized LOC105379695 ENSG00000272273 LOC105379695 105379695 NA NA
NA ENSG00000229808 RP11-456P18.2 ENSG00000229808 NA NA
chromosome 3 open reading frame 52 ENSG00000114529 C3orf52 79669 NA NA
cyclin-dependent kinase inhibitor 1A ENSG00000124762 CDKN1A 1026 This gene encodes a potent cyclin-dependent kinase inhibitor. The encoded protein binds to and inhibits the activity of cyclin-cyclin-dependent kinase2 or -cyclin-dependent kinase4 complexes, and thus functions as a regulator of cell cycle progression at G1. The expression of this gene is tightly controlled by the tumor suppressor protein p53, through which this protein mediates the p53-dependent cell cycle G1 phase arrest in response to a variety of stress stimuli. This protein can interact with proliferating cell nuclear antigen, a DNA polymerase accessory factor, and plays a regulatory role in S phase DNA replication and DNA damage repair. This protein was reported to be specifically cleaved by CASP3-like caspases, which thus leads to a dramatic activation of cyclin-dependent kinase2, and may be instrumental in the execution of apoptosis following caspase activation. Mice that lack this gene have the ability to regenerate damaged or missing tissue. Multiple alternatively spliced variants have been found for this gene. NA
nuclear factor kappa B subunit 2 ENSG00000077150 NFKB2 4791 This gene encodes a subunit of the transcription factor complex nuclear factor-kappa-B (NFkB). The NFkB complex is expressed in numerous cell types and functions as a central activator of genes involved in inflammation and immune function. The protein encoded by this gene can function as both a transcriptional activator or repressor depending on its dimerization partner. The p100 full-length protein is co-translationally processed into a p52 active form. Chromosomal rearrangements and translocations of this locus have been observed in B cell lymphomas, some of which may result in the formation of fusion proteins. There is a pseudogene for this gene on chromosome 18. Alternative splicing results in multiple transcript variants. NA
small nucleolar RNA, H/ACA box 73B ENSG00000200087 SNORA73B ENSG00000200087 NA NA
solute carrier family 7 member 5 ENSG00000103257 SLC7A5 8140 NA NA
NA ENSG00000182368 NA NA NA TRUE
BCL2 related protein A1 ENSG00000140379 BCL2A1 597 This gene encodes a member of the BCL-2 protein family. The proteins of this family form hetero- or homodimers and act as anti- and pro-apoptotic regulators that are involved in a wide variety of cellular activities such as embryonic development, homeostasis and tumorigenesis. The protein encoded by this gene is able to reduce the release of pro-apoptotic cytochrome c from mitochondria and block caspase activation. This gene is a direct transcription target of NF-kappa B in response to inflammatory mediators, and is up-regulated by different extracellular signals, such as granulocyte-macrophage colony-stimulating factor (GM-CSF), CD40, phorbol ester and inflammatory cytokine TNF and IL-1, which suggests a cytoprotective function that is essential for lymphocyte activation as well as cell survival. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
coiled-coil domain containing 150 pseudogene 1 ENSG00000256304 CCDC150P1 ENSG00000256304 NA NA
chitinase 3 like 1 ENSG00000133048 CHI3L1 1116 Chitinases catalyze the hydrolysis of chitin, which is an abundant glycopolymer found in insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 family of chitinases includes eight human family members. This gene encodes a glycoprotein member of the glycosyl hydrolase 18 family. The protein lacks chitinase activity and is secreted by activated macrophages, chondrocytes, neutrophils and synovial cells. The protein is thought to play a role in the process of inflammation and tissue remodeling. NA
Pim-1 proto-oncogene, serine/threonine kinase ENSG00000137193 PIM1 5292 The protein encoded by this gene belongs to the Ser/Thr protein kinase family, and PIM subfamily. This gene is expressed primarily in B-lymphoid and myeloid cell lines, and is overexpressed in hematopoietic malignancies and in prostate cancer. It plays a role in signal transduction in blood cells, contributing to both cell proliferation and survival, and thus provides a selective advantage in tumorigenesis. Both the human and orthologous mouse genes have been reported to encode two isoforms (with preferential cellular localization) resulting from the use of alternative in-frame translation initiation codons, the upstream non-AUG (CUG) and downstream AUG codons (PMIDs:16186805, 1825810). NA
ZFP36 ring finger protein ENSG00000128016 ZFP36 7538 NA NA
interleukin 4 receptor ENSG00000077238 IL4R 3566 This gene encodes the alpha chain of the interleukin-4 receptor, a type I transmembrane protein that can bind interleukin 4 and interleukin 13 to regulate IgE production. The encoded protein also can bind interleukin 4 to promote differentiation of Th2 cells. A soluble form of the encoded protein can be produced by proteolysis of the membrane-bound protein, and this soluble form can inhibit IL4-mediated cell proliferation and IL5 upregulation by T-cells. Allelic variations in this gene have been associated with atopy, a condition that can manifest itself as allergic rhinitis, sinusitus, asthma, or eczema. Polymorphisms in this gene are also associated with resistance to human immunodeficiency virus type-1 infection. Alternate splicing results in multiple transcript variants. NA
activating transcription factor 3 ENSG00000162772 ATF3 467 This gene encodes a member of the mammalian activation transcription factor/cAMP responsive element-binding (CREB) protein family of transcription factors. This gene is induced by a variety of signals, including many of those encountered by cancer cells, and is involved in the complex process of cellular stress response. Multiple transcript variants encoding different isoforms have been found for this gene. It is possible that alternative splicing of this gene may be physiologically important in the regulation of target genes. NA
salt inducible kinase 1 ENSG00000142178 SIK1 150094 NA NA
Y-box binding protein 3 ENSG00000060138 YBX3 8531 NA NA
TNF alpha induced protein 3 ENSG00000118503 TNFAIP3 7128 This gene was identified as a gene whose expression is rapidly induced by the tumor necrosis factor (TNF). The protein encoded by this gene is a zinc finger protein and ubiqitin-editing enzyme, and has been shown to inhibit NF-kappa B activation as well as TNF-mediated apoptosis. The encoded protein, which has both ubiquitin ligase and deubiquitinase activities, is involved in the cytokine-mediated immune and inflammatory responses. Several transcript variants encoding the same protein have been found for this gene. NA
histone cluster 1, H1e ENSG00000168298 HIST1H1E 3008 Histones are basic nuclear proteins responsible for nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H1 family. Transcripts from this gene lack polyA tails but instead contain a palindromic termination element. This gene is found in the large histone gene cluster on chromosome 6. NA
MAF bZIP transcription factor F ENSG00000185022 MAFF 23764 The protein encoded by this gene is a basic leucine zipper (bZIP) transcription factor that lacks a transactivation domain. It is known to bind the US-2 DNA element in the promoter of the oxytocin receptor (OTR) gene and most likely heterodimerizes with other leucine zipper-containing proteins to enhance expression of the OTR gene during term pregnancy. The encoded protein can also form homodimers, and since it lacks a transactivation domain, the homodimer may act as a repressor of transcription. This gene may also be involved in the cellular stress response. Multiple transcript variants encoding two different isoforms have been found for this gene. NA
G protein-coupled receptor 84 ENSG00000139572 GPR84 53831 NA NA
basic helix-loop-helix family member e40 ENSG00000134107 BHLHE40 8553 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. NA
phorbol-12-myristate-13-acetate-induced protein 1 ENSG00000141682 PMAIP1 5366 NA NA
growth arrest and DNA damage inducible beta ENSG00000099860 GADD45B 4616 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The genes in this group respond to environmental stresses by mediating activation of the p38/JNK pathway. This activation is mediated via their proteins binding and activating MTK1/MEKK4 kinase, which is an upstream activator of both p38 and JNK MAPKs. The function of these genes or their protein products is involved in the regulation of growth and apoptosis. These genes are regulated by different mechanisms, but they are often coordinately expressed and can function cooperatively in inhibiting cell growth. NA
solute carrier family 2 member 3 ENSG00000059804 SLC2A3 6515 NA NA
NA ENSG00000262652 RP13-638C3.2 ENSG00000262652 NA NA
nicotinamide phosphoribosyltransferase ENSG00000105835 NAMPT 10135 This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. NA
uncharacterized LOC100506142 ENSG00000250116 LOC100506142 100506142 NA NA
interleukin 1 beta ENSG00000125538 IL1B 3553 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This cytokine is produced by activated macrophages as a proprotein, which is proteolytically processed to its active form by caspase 1 (CASP1/ICE). This cytokine is an important mediator of the inflammatory response, and is involved in a variety of cellular activities, including cell proliferation, differentiation, and apoptosis. The induction of cyclooxygenase-2 (PTGS2/COX2) by this cytokine in the central nervous system (CNS) is found to contribute to inflammatory pain hypersensitivity. This gene and eight other interleukin 1 family genes form a cytokine gene cluster on chromosome 2. NA
NA ENSG00000270640 RP11-373D23.2 ENSG00000270640 NA NA
small nucleolar RNA, H/ACA box 31 ENSG00000199477 SNORA31 677814 NA NA
dual specificity phosphatase 2 ENSG00000158050 DUSP2 1844 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1 and ERK2, is predominantly expressed in hematopoietic tissues, and is localized in the nucleus. NA
nicotinamide phosphoribosyltransferase pseudogene 1 ENSG00000229644 NAMPTP1 ENSG00000229644 NA NA
RANBP2-like and GRIP domain containing 2 ENSG00000185304 RGPD2 729857 NA NA
NA ENSG00000268903 RP11-34P13.15 ENSG00000268903 NA NA
HUS1 checkpoint clamp component B ENSG00000188996 HUS1B 135458 The protein encoded by this gene is most closely related to HUS1, a component of a cell cycle checkpoint protein complex involved in cell cycle arrest in response to DNA damage. This protein can interact with the check point protein RAD1 but not with RAD9. Overexpression of this protein has been shown to induce cell death, which suggests a related but distinct role of this protein, as compared to the HUS1. NA
NA ENSG00000224376 AC017104.6 ENSG00000224376 NA NA
NA ENSG00000260466 RP4-536B24.2 ENSG00000260466 NA NA
NA ENSG00000226396 RP5-1056L3.3 ENSG00000226396 NA NA
charged multivesicular body protein 4B pseudogene 1 ENSG00000258469 CHMP4BP1 ENSG00000258469 NA NA
DnaJ heat shock protein family (Hsp40) member B1 ENSG00000132002 DNAJB1 3337 This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. NA
strawberry notch homolog 2 (Drosophila) ENSG00000064932 SBNO2 22904 NA NA
JunB proto-oncogene, AP-1 transcription factor subunit ENSG00000171223 JUNB 3726 NA NA
NA ENSG00000255513 AC005363.9 ENSG00000255513 NA NA
NA ENSG00000273284 RP11-888D10.4 ENSG00000273284 NA NA
NA ENSG00000223461 AC004471.9 ENSG00000223461 NA NA
regulatory factor X2 ENSG00000087903 RFX2 5990 This gene is a member of the regulatory factor X gene family, which encodes transcription factors that contain a highly-conserved winged helix DNA binding domain. The protein encoded by this gene is structurally related to regulatory factors X1, X3, X4, and X5. It is a transcriptional activator that can bind DNA as a monomer or as a heterodimer with other RFX family members. This protein can bind to cis elements in the promoter of the IL-5 receptor alpha gene. Two transcript variants encoding different isoforms have been described for this gene, and both variants utilize alternative polyadenylation sites. NA
NA ENSG00000212743 RP11-563J2.3 ENSG00000212743 NA NA
NA ENSG00000269952 RP11-324I22.3 ENSG00000269952 NA NA
v-myc avian myelocytomatosis viral oncogene homolog ENSG00000136997 MYC 4609 The protein encoded by this gene is a multifunctional, nuclear phosphoprotein that plays a role in cell cycle progression, apoptosis and cellular transformation. It functions as a transcription factor that regulates transcription of specific target genes. Mutations, overexpression, rearrangement and translocation of this gene have been associated with a variety of hematopoietic tumors, leukemias and lymphomas, including Burkitt lymphoma. There is evidence to show that alternative translation initiations from an upstream, in-frame non-AUG (CUG) and a downstream AUG start site result in the production of two isoforms with distinct N-termini. The synthesis of non-AUG initiated protein is suppressed in Burkitt’s lymphomas, suggesting its importance in the normal function of this gene. NA
small nucleolar RNA, H/ACA box 7A ENSG00000207496 SNORA7A 619563 NA NA
NA ENSG00000236047 AC073410.1 ENSG00000236047 NA NA
azurocidin 1 ENSG00000172232 AZU1 566 Azurophil granules, specialized lysosomes of the neutrophil, contain at least 10 proteins implicated in the killing of microorganisms. This gene encodes a preproprotein that is proteolytically processed to generate a mature azurophil granule antibiotic protein, with monocyte chemotactic and antimicrobial activity. It is also an important multifunctional inflammatory mediator. This encoded protein is a member of the serine protease gene family but it is not a serine proteinase, because the active site serine and histidine residues are replaced. The genes encoding this protein, neutrophil elastase 2, and proteinase 3 are in a cluster located at chromosome 19pter. All 3 genes are expressed coordinately and their protein products are packaged together into azurophil granules during neutrophil differentiation. NA
nuclear factor, interleukin 3 regulated ENSG00000165030 NFIL3 4783 The protein encoded by this gene is a transcriptional regulator that binds as a homodimer to activating transcription factor (ATF) sites in many cellular and viral promoters. The encoded protein represses PER1 and PER2 expression and therefore plays a role in the regulation of circadian rhythm. Three transcript variants encoding the same protein have been found for this gene. NA
F-box and WD repeat domain containing 4 pseudogene 1 ENSG00000230701 FBXW4P1 26226 NA NA
plasminogen activator, urokinase receptor ENSG00000011422 PLAUR 5329 This gene encodes the receptor for urokinase plasminogen activator and, given its role in localizing and promoting plasmin formation, likely influences many normal and pathological processes related to cell-surface plasminogen activation and localized degradation of the extracellular matrix. It binds both the proprotein and mature forms of urokinase plasminogen activator and permits the activation of the receptor-bound pro-enzyme by plasmin. The protein lacks transmembrane or cytoplasmic domains and may be anchored to the plasma membrane by a glycosyl-phosphatidylinositol (GPI) moiety following cleavage of the nascent polypeptide near its carboxy-terminus. However, a soluble protein is also produced in some cell types. Alternative splicing results in multiple transcript variants encoding different isoforms. The proprotein experiences several post-translational cleavage reactions that have not yet been fully defined. NA
intercellular adhesion molecule 1 ENSG00000090339 ICAM1 3383 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. NA
NA ENSG00000224114 RP11-343H5.4 ENSG00000224114 NA NA
UBE2R2 antisense RNA 1 ENSG00000235481 UBE2R2-AS1 ENSG00000235481 NA NA
RELB proto-oncogene, NF-kB subunit ENSG00000104856 RELB 5971 NA NA
small nucleolar RNA, H/ACA box 25 ENSG00000207112 SNORA25 684959 NA NA
NA ENSG00000197697 NA NA NA TRUE
chymotrypsin like ENSG00000141086 CTRL 1506 NA NA
ring finger protein 122 ENSG00000133874 RNF122 79845 The encoded protein contains a RING finger, a motif present in a variety of functionally distinct proteins and known to be involved in protein-protein and protein-DNA interactions. The encoded protein is localized to the endoplasmic reticulum and golgi apparatus, and may be associated with cell viability. NA
NA ENSG00000267607 CTD-2369P2.8 ENSG00000267607 NA NA
solute carrier family 11 member 1 ENSG00000018280 SLC11A1 6556 This gene is a member of the solute carrier family 11 (proton-coupled divalent metal ion transporters) family and encodes a multi-pass membrane protein. The protein functions as a divalent transition metal (iron and manganese) transporter involved in iron metabolism and host resistance to certain pathogens. Mutations in this gene have been associated with susceptibility to infectious diseases such as tuberculosis and leprosy, and inflammatory diseases such as rheumatoid arthritis and Crohn disease. Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only one has been determined. NA
phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 ENSG00000259657 PIGHP1 ENSG00000259657 NA NA
small nucleolar RNA, H/ACA box 64 ENSG00000207405 SNORA64 26784 NA NA
triggering receptor expressed on myeloid cells 1 ENSG00000124731 TREM1 54210 This gene encodes a receptor belonging to the Ig superfamily that is expressed on myeloid cells. This protein amplifies neutrophil and monocyte-mediated inflammatory responses triggered by bacterial and fungal infections by stimulating release of pro-inflammatory chemokines and cytokines, as well as increased surface expression of cell activation markers. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. NA
BTG family member 3 ENSG00000154640 BTG3 10950 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein might play a role in neurogenesis in the central nervous system. Two transcript variants encoding different isoforms have been found for this gene. NA
NME/NM23 nucleoside diphosphate kinase 2 pseudogene 1 ENSG00000123009 NME2P1 ENSG00000123009 NA NA
transmembrane protein 217 ENSG00000172738 TMEM217 221468 NA NA
NA ENSG00000273320 RP11-22N19.2 ENSG00000273320 NA NA
small nucleolar RNA, C/D box 10 ENSG00000238917 SNORD10 ENSG00000238917 NA NA
NA ENSG00000179294 NA NA NA TRUE
NA ENSG00000269463 RP11-727F15.13 ENSG00000269463 NA NA
C-X-C motif chemokine ligand 1 ENSG00000163739 CXCL1 2919 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. NA
NA ENSG00000262147 RP13-638C3.3 ENSG00000262147 NA NA
NA ENSG00000255843 AP000593.7 ENSG00000255843 NA NA
NA ENSG00000225721 RP11-269F19.2 ENSG00000225721 NA NA
tandem C2 domains, nuclear ENSG00000165929 TC2N 123036 NA NA
aquaporin 9 ENSG00000103569 AQP9 366 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. NA
histone cluster 2, H2bf ENSG00000203814 HIST2H2BF 440689 Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. This structure consists of approximately 146 bp of DNA wrapped around a nucleosome, an octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene encodes a replication-dependent histone that is a member of the histone H2B family and is found in a histone cluster on chromosome 1. NA
NA ENSG00000204807 NA NA NA TRUE
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol summary query name notfound
2813 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. ENSG00000169347 glycoprotein 2 NA
80763 SPX The protein encoded by this gene is a hormone involved in modulation of cardiovascular and renal function. It has also been shown in rats to cause weight loss. Several transcript variants have been found for this gene. ENSG00000134548 spexin hormone NA
5967 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000115386 regenerating family member 1 alpha NA
5644 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1 NA
10136 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. ENSG00000142789 chymotrypsin like elastase family member 3A NA
5284 PIGR This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. ENSG00000162896 polymeric immunoglobulin receptor NA
5406 PNLIP This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. ENSG00000175535 pancreatic lipase NA
5319 PLA2G1B This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. ENSG00000170890 phospholipase A2 group IB NA
23436 CELA3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. ENSG00000219073 chymotrypsin like elastase family member 3B NA
2244 FGB The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000171564 fibrinogen beta chain NA
ENSG00000249790 RP11-20D14.6 NA ENSG00000249790 NA NA
1357 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. ENSG00000091704 carboxypeptidase A1 NA
ENSG00000272030 RP1-178F15.4 NA ENSG00000272030 NA NA
1208 CLPS The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000137392 colipase NA
342898 SYCN NA ENSG00000179751 syncollin NA
345 APOC3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. ENSG00000110245 apolipoprotein C3 NA
10417 SPON2 NA ENSG00000159674 spondin 2 NA
124220 ZG16B NA ENSG00000162078 zymogen granule protein 16B NA
4589 MUC7 This gene encodes a small salivary mucin, which is thought to play a role in facilitating the clearance of bacteria in the oral cavity and to aid in mastication, speech, and swallowing. The central domain of this glycoprotein contains tandem repeats, each composed of 23 amino acids. This antimicrobial protein has antibacterial and antifungal activity. The most common allele contains 6 repeats, and some alleles may be associated with susceptibility to asthma. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. ENSG00000171195 mucin 7, secreted NA
2335 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1 NA
NA NA NA ENSG00000184674 NA TRUE
57467 HHATL NA ENSG00000010282 hedgehog acyltransferase-like NA
1504 CTRB1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. ENSG00000168925 chymotrypsinogen B1 NA
92421 CHMP4C CHMP4C belongs to the chromatin-modifying protein/charged multivesicular body protein (CHMP) family. These proteins are components of ESCRT-III (endosomal sorting complex required for transport III), a complex involved in degradation of surface receptor proteins and formation of endocytic multivesicular bodies (MVBs). Some CHMPs have both nuclear and cytoplasmic/vesicular distributions, and one such CHMP, CHMP1A (MIM 164010), is required for both MVB formation and regulation of cell cycle progression (Tsang et al., 2006 [PubMed 16730941]). ENSG00000164695 charged multivesicular body protein 4C NA
4900 NRGN Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. ENSG00000154146 neurogranin NA
NA NA NA ENSG00000250606 NA TRUE
5407 PNLIPRP1 NA ENSG00000187021 pancreatic lipase related protein 1 NA
6424 SFRP4 Secreted frizzled-related protein 4 (SFRP4) is a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. The expression of SFRP4 in ventricular myocardium correlates with apoptosis related gene expression. ENSG00000106483 secreted frizzled related protein 4 NA
91624 NEXN This gene encodes a filamentous actin-binding protein that may function in cell adhesion and migration. Mutations in this gene have been associated with dilated cardiomyopathy, also known as CMD1CC. Alternatively spliced transcript variants have been described. ENSG00000162614 nexilin F-actin binding protein NA
440387 CTRB2 NA ENSG00000168928 chymotrypsinogen B2 NA
63036 CELA2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. ENSG00000142615 chymotrypsin like elastase family member 2A NA
10529 NEBL This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. ENSG00000078114 nebulette NA
4632 MYL1 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. ENSG00000168530 myosin light chain 1 NA
1289 COL5A1 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. The encoded procollagen protein occurs commonly as the heterotrimer pro-alpha1(V)-pro-alpha1(V)-pro-alpha2(V). Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. Alternative splicing of this gene results in multiple transcript variants. ENSG00000130635 collagen type V alpha 1 NA
ENSG00000273179 RP11-20I20.4 NA ENSG00000273179 NA NA
ENSG00000259279 CTD-2033D15.1 NA ENSG00000259279 NA NA
2318 FLNC This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000128591 filamin C NA
ENSG00000268649 RP4-806M20.4 NA ENSG00000268649 NA NA
4619 MYH1 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. ENSG00000109061 myosin, heavy chain 1, skeletal muscle, adult NA
1277 COL1A1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1 NA
100874214 TM4SF19-AS1 NA ENSG00000235897 TM4SF19 antisense RNA 1 NA
347735 SERINC2 NA ENSG00000168528 serine incorporator 2 NA
1291 COL6A1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. ENSG00000142156 collagen type VI alpha 1 NA
6442 SGCA This gene encodes a component of the dystrophin-glycoprotein complex (DGC), which is critical to the stability of muscle fiber membranes and to the linking of the actin cytoskeleton to the extracellular matrix. Its expression is thought to be restricted to striated muscle. Mutations in this gene result in type 2D autosomal recessive limb-girdle muscular dystrophy. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000108823 sarcoglycan alpha NA
ENSG00000212743 RP11-563J2.3 NA ENSG00000212743 NA NA
643834 PGA3 This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. ENSG00000229859 pepsinogen 3, group I (pepsinogen A) NA
ENSG00000250011 HMGB1P3 NA ENSG00000250011 high mobility group box 1 pseudogene 3 NA
10103 TSPAN1 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. ENSG00000117472 tetraspanin 1 NA
2243 FGA This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. ENSG00000171560 fibrinogen alpha chain NA
5004 ORM1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. ENSG00000229314 orosomucoid 1 NA
NA NA NA ENSG00000272403 NA TRUE
NA NA NA ENSG00000197262 NA TRUE
10924 SMPDL3A NA ENSG00000172594 sphingomyelin phosphodiesterase acid like 3A NA
6366 CCL21 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). ENSG00000137077 C-C motif chemokine ligand 21 NA
57644 MYH7B The myosin II molecule is a multi-subunit complex consisting of two heavy chains and four light chains. This gene encodes a heavy chain of myosin II, which is a member of the motor-domain superfamily. The heavy chain includes a globular motor domain, which catalyzes ATP hydrolysis and interacts with actin, and a tail domain in which heptad repeat sequences promote dimerization by interacting to form a rod-like alpha-helical coiled coil. This heavy chain subunit is a slow-twitch myosin. Alternatively spliced transcript variants have been found, but the full-length nature of these variants is not determined. ENSG00000078814 myosin, heavy chain 7B, cardiac muscle, beta NA
1292 COL6A2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2 NA
5742 PTGS1 This is one of two genes encoding similar enzymes that catalyze the conversion of arachinodate to prostaglandin. The encoded protein regulates angiogenesis in endothelial cells, and is inhibited by nonsteroidal anti-inflammatory drugs such as aspirin. Based on its ability to function as both a cyclooxygenase and as a peroxidase, the encoded protein has been identified as a moonlighting protein. The protein may promote cell proliferation during tumor progression. Alternative splicing results in multiple transcript variants. ENSG00000095303 prostaglandin-endoperoxide synthase 1 NA
9953 HS3ST3B1 The protein encoded by this gene is a type II integral membrane protein that belongs to the 3-O-sulfotransferases family. These proteins catalyze the addition of sulfate groups at the 3-OH position of glucosamine in heparan sulfate. The substrate specificity of individual members of the family is based on prior modification of the heparan sulfate chain, thus allowing different members of the family to generate binding sites for different proteins on the same heparan sulfate chain. Following treatment with a histone deacetylase inhibitor, expression of this gene is activated in a pancreatic cell line. The increased expression results in promotion of the epithelial-mesenchymal transition. In addition, the modification catalyzed by this protein allows herpes simplex virus membrane fusion and penetration. A very closely related homolog with an almost identical sulfotransferase domain maps less than 1 Mb away. Alternative splicing results in multiple transcript variants. ENSG00000125430 heparan sulfate-glucosamine 3-sulfotransferase 3B1 NA
6712 SPTBN2 Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements. ENSG00000173898 spectrin beta, non-erythrocytic 2 NA
57699 CPNE5 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. ENSG00000124772 copine 5 NA
4017 LOXL2 This gene encodes a member of the lysyl oxidase gene family. The prototypic member of the family is essential to the biogenesis of connective tissue, encoding an extracellular copper-dependent amine oxidase that catalyses the first step in the formation of crosslinks in collagens and elastin. A highly conserved amino acid sequence at the C-terminus end appears to be sufficient for amine oxidase activity, suggesting that each family member may retain this function. The N-terminus is poorly conserved and may impart additional roles in developmental regulation, senescence, tumor suppression, cell growth control, and chemotaxis to each member of the family. ENSG00000134013 lysyl oxidase like 2 NA
219537 SMTNL1 SMTNL1, which is a member of the smoothelin (SMTN; MIM 602127) family, regulates contraction and relaxation of skeletal and smooth muscle fibers and mediates vascular adaptation to exercise (Wooldridge et al., 2008 [PubMed 18310078]). ENSG00000214872 smoothelin like 1 NA
7057 THBS1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1 NA
ENSG00000258376 RP4-647C14.2 NA ENSG00000258376 NA NA
4881 NPR1 Guanylyl cyclases, catalyzing the production of cGMP from GTP, are classified as soluble and membrane forms (Garbers and Lowe, 1994 [PubMed 7982997]). The membrane guanylyl cyclases, often termed guanylyl cyclases A through F, form a family of cell-surface receptors with a similar topographic structure: an extracellular ligand-binding domain, a single membrane-spanning domain, and an intracellular region that contains a protein kinase-like domain and a cyclase catalytic domain. GC-A and GC-B function as receptors for natriuretic peptides; they are also referred to as atrial natriuretic peptide receptor A (NPR1) and type B (NPR2; MIM 108961). Also see NPR3 (MIM 108962), which encodes a protein with only the ligand-binding transmembrane and 37-amino acid cytoplasmic domains. NPR1 is a membrane-bound guanylate cyclase that serves as the receptor for both atrial and brain natriuretic peptides (ANP (MIM 108780) and BNP (MIM 600295), respectively). ENSG00000169418 natriuretic peptide receptor 1 NA
1299 COL9A3 This gene encodes one of the three alpha chains of type IX collagen, the major collagen component of hyaline cartilage. Type IX collagen, a heterotrimeric molecule, is usually found in tissues containing type II collagen, a fibrillar collagen. Mutations in this gene are associated with multiple epiphyseal dysplasia type 3. ENSG00000092758 collagen type IX alpha 3 NA
ENSG00000254680 RP11-265D17.2 NA ENSG00000254680 NA NA
284297 SSC5D NA ENSG00000179954 scavenger receptor cysteine rich family member with 5 domains NA
1360 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. ENSG00000153002 carboxypeptidase B1 NA
4313 MMP2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000087245 matrix metallopeptidase 2 NA
23657 SLC7A11 This gene encodes a member of a heteromeric, sodium-independent, anionic amino acid transport system that is highly specific for cysteine and glutamate. In this system, designated Xc(-), the anionic form of cysteine is transported in exchange for glutamate. This protein has been identified as the predominant mediator of Kaposi sarcoma-associated herpesvirus fusion and entry permissiveness into cells. Also, increased expression of this gene in primary gliomas (compared to normal brain tissue) was associated with increased glutamate secretion via the XCT channels, resulting in neuronal cell death. ENSG00000151012 solute carrier family 7 member 11 NA
ENSG00000264272 CTD-2514K5.4 NA ENSG00000264272 NA NA
89765 RSPH1 This gene encodes a male meiotic metaphase chromosome-associated acidic protein. This gene is expressed in tissues with motile cilia or flagella, including the trachea, lungs, airway brushings, and testes. Mutations in this gene result in primary ciliary dyskinesia-24. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000160188 radial spoke head 1 homolog NA
729238 SFTPA2 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. ENSG00000185303 surfactant protein A2 NA
643866 CBLN3 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). ENSG00000139899 cerebellin 3 precursor NA
ENSG00000224497 RPL36P4 NA ENSG00000224497 ribosomal protein L36 pseudogene 4 NA
84959 UBASH3B This gene encodes a protein that contains a ubiquitin associated domain at the N-terminus, an SH3 domain, and a C-terminal domain with similarities to the catalytic motif of phosphoglycerate mutase. The encoded protein was found to inhibit endocytosis of epidermal growth factor receptor (EGFR) and platelet-derived growth factor receptor. ENSG00000154127 ubiquitin associated and SH3 domain containing B NA
ENSG00000272512 RP11-54O7.17 NA ENSG00000272512 NA NA
29923 HILPDA NA ENSG00000135245 hypoxia inducible lipid droplet associated NA
3960 LGALS4 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. ENSG00000171747 galectin 4 NA
100127983 C8orf88 NA ENSG00000253250 chromosome 8 open reading frame 88 NA
79695 GALNT12 This gene encodes a member of a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases, which catalyze the transfer of N-acetylgalactosamine (GalNAc) from UDP-GalNAc to a serine or threonine residue on a polypeptide acceptor in the initial step of O-linked protein glycosylation. Mutations in this gene are associated with an increased susceptibility to colorectal cancer. ENSG00000119514 polypeptide N-acetylgalactosaminyltransferase 12 NA
3902 LAG3 Lymphocyte-activation protein 3 belongs to Ig superfamily and contains 4 extracellular Ig-like domains. The LAG3 gene contains 8 exons. The sequence data, exon/intron organization, and chromosomal localization all indicate a close relationship of LAG3 to CD4. ENSG00000089692 lymphocyte activating 3 NA
1278 COL1A2 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000164692 collagen type I alpha 2 chain NA
3678 ITGA5 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. ENSG00000161638 integrin subunit alpha 5 NA
257000 TINCR This gene produces a spliced long non-coding RNA that is required for normal epidermal differentiation. This transcript regulates the expression of genes involved in the differentiation of epidermal tissue. Mutations in some of the genes targeted by this transcript have been implicated in epidermal skin diseases. ENSG00000223573 tissue differentiation-inducing non-protein coding RNA NA
8513 LIPF This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000182333 lipase F, gastric type NA
200316 APOBEC3F This gene is a member of the cytidine deaminase gene family. It is one of seven related genes or pseudogenes found in a cluster, thought to result from gene duplication, on chromosome 22. Members of the cluster encode proteins that are structurally and functionally related to the C to U RNA-editing cytidine deaminase APOBEC1. It is thought that the proteins may be RNA editing enzymes and have roles in growth or cell cycle control. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000128394 apolipoprotein B mRNA editing enzyme catalytic subunit 3F NA
113146 AHNAK2 NA ENSG00000185567 AHNAK nucleoprotein 2 NA
3290 HSD11B1 The protein encoded by this gene is a microsomal enzyme that catalyzes the conversion of the stress hormone cortisol to the inactive metabolite cortisone. In addition, the encoded protein can catalyze the reverse reaction, the conversion of cortisone to cortisol. Too much cortisol can lead to central obesity, and a particular variation in this gene has been associated with obesity and insulin resistance in children. Mutations in this gene and H6PD (hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase)) are the cause of cortisone reductase deficiency. Alternate splicing results in multiple transcript variants encoding the same protein. ENSG00000117594 hydroxysteroid 11-beta dehydrogenase 1 NA
100129845 PCOLCE-AS1 NA ENSG00000224729 PCOLCE antisense RNA 1 NA
4504 MT3 NA ENSG00000087250 metallothionein 3 NA
3263 HPX This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. ENSG00000110169 hemopexin NA
112399 EGLN3 NA ENSG00000129521 egl-9 family hypoxia inducible factor 3 NA
11117 EMILIN1 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. ENSG00000138080 elastin microfibril interfacer 1 NA
ENSG00000177337 DLGAP1-AS1 NA ENSG00000177337 DLGAP1 antisense RNA 1 NA
5304 PIP NA ENSG00000159763 prolactin induced protein NA
2944 GSTM1 Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. Null mutations of this class mu gene have been linked with an increase in a number of cancers, likely due to an increased susceptibility to environmental toxins and carcinogens. Multiple protein isoforms are encoded by transcript variants of this gene. ENSG00000134184 glutathione S-transferase mu 1 NA
11248 NXPH3 NA ENSG00000182575 neurexophilin 3 NA
ENSG00000257433 RP1-197B17.3 NA ENSG00000257433 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
CTBP2 1488 C-terminal binding protein 2 ENSG00000175029 This gene produces alternative transcripts encoding two distinct proteins. One protein is a transcriptional repressor, while the other isoform is a major component of specialized synapses known as synaptic ribbons. Both proteins contain a NAD+ binding domain similar to NAD+-dependent 2-hydroxyacid dehydrogenases. A portion of the 3’ untranslated region was used to map this gene to chromosome 21q21.3; however, it was noted that similar loci elsewhere in the genome are likely. Blast analysis shows that this gene is present on chromosome 10. Several transcript variants encoding two different isoforms have been found for this gene. NA
PLA2G1B 5319 phospholipase A2 group IB ENSG00000170890 This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. NA
IRS2 8660 insulin receptor substrate 2 ENSG00000185950 This gene encodes the insulin receptor substrate 2, a cytoplasmic signaling molecule that mediates effects of insulin, insulin-like growth factor 1, and other cytokines by acting as a molecular adaptor between diverse receptor tyrosine kinases and downstream effectors. The product of this gene is phosphorylated by the insulin receptor tyrosine kinase upon receptor stimulation, as well as by an interleukin 4 receptor-associated kinase in response to IL4 treatment. NA
LOC100129550 100129550 uncharacterized LOC100129550 ENSG00000273033 NA NA
DHRS7 51635 dehydrogenase/reductase 7 ENSG00000100612 This gene encodes a member of the short-chain dehydrogenases/reductases (SDR) family, which has over 46,000 members. Members in this family are enzymes that metabolize many different compounds, such as steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. NA
KLF13 51621 Kruppel like factor 13 ENSG00000169926 KLF13 belongs to a family of transcription factors that contain 3 classical zinc finger DNA-binding domains consisting of a zinc atom tetrahedrally coordinated by 2 cysteines and 2 histidines (C2H2 motif). These transcription factors bind to GC-rich sequences and related GT and CACCC boxes (Scohy et al., 2000 [PubMed 11087666]). NA
TNS3 64759 tensin 3 ENSG00000136205 NA NA
WWP2 11060 WW domain containing E3 ubiquitin protein ligase 2 ENSG00000198373 This gene encodes a member of the Nedd4 family of E3 ligases, which play an important role in protein ubiquitination. The encoded protein contains four WW domains and may play a role in multiple processes including chondrogenesis and the regulation of oncogenic signaling pathways via interactions with Smad proteins and the tumor suppressor PTEN. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the long arm of chromosome 10. NA
NDFIP1 80762 Nedd4 family interacting protein 1 ENSG00000131507 The protein encoded by this gene belongs to a small group of evolutionarily conserved proteins with three transmembrane domains. It is a potential target for ubiquitination by the Nedd4 family of proteins. This protein is thought to be part of a family of integral Golgi membrane proteins. NA
SATB1 6304 SATB homeobox 1 ENSG00000182568 This gene encodes a matrix protein which binds nuclear matrix and scaffold-associating DNAs through a unique nuclear architecture. The protein recruits chromatin-remodeling factors in order to regulate chromatin structure and gene expression. NA
STX3 6809 syntaxin 3 ENSG00000166900 The gene is a member of the syntaxin family. The encoded protein is targeted to the apical membrane of epithelial cells where it forms clusters and is important in establishing and maintaining polarity necessary for protein trafficking involving vesicle fusion and exocytosis. Alternative splicing results in multiple transcript variants. NA
SERINC1 57515 serine incorporator 1 ENSG00000111897 NA NA
AC068580.6 ENSG00000235027 NA ENSG00000235027 NA NA
MTMR10 54893 myotubularin related protein 10 ENSG00000166912 NA NA
TGOLN2 10618 trans-golgi network protein 2 ENSG00000152291 This gene encodes a type I integral membrane protein that is localized to the trans-Golgi network, a major sorting station for secretory and membrane proteins. The encoded protein cycles between early endosomes and the trans-Golgi network, and may play a role in exocytic vesicle formation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
SLC26A6 65010 solute carrier family 26 member 6 ENSG00000225697 This gene belongs to the solute carrier 26 family, whose members encode anion transporter proteins. This particular family member encodes a protein involved in transporting chloride, oxalate, sulfate and bicarbonate. Alternatively spliced transcript variants encoding distinct isoforms have been described. NA
RP11-415J8.3 ENSG00000225313 NA ENSG00000225313 NA NA
GNAQ 2776 G protein subunit alpha q ENSG00000156052 This locus encodes a guanine nucleotide-binding protein. The encoded protein, an alpha subunit in the Gq class, couples a seven-transmembrane domain receptor to activation of phospolipase C-beta. Mutations at this locus have been associated with problems in platelet activation and aggregation. A related pseudogene exists on chromosome 2. NA
APLP2 334 amyloid beta precursor like protein 2 ENSG00000084234 This gene encodes amyloid precursor- like protein 2 (APLP2), which is a member of the APP (amyloid precursor protein) family including APP, APLP1 and APLP2. This protein is ubiquitously expressed. It contains heparin-, copper- and zinc- binding domains at the N-terminus, BPTI/Kunitz inhibitor and E2 domains in the middle region, and transmembrane and intracellular domains at the C-terminus. This protein interacts with major histocompatibility complex (MHC) class I molecules. The synergy of this protein and the APP is required to mediate neuromuscular transmission, spatial learning and synaptic plasticity. This protein has been implicated in the pathogenesis of Alzheimer’s disease. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. NA
DCAF11 80344 DDB1 and CUL4 associated factor 11 ENSG00000100897 This gene encodes a WD repeat-containing protein that interacts with the COP9 signalosome, a macromolecular complex that interacts with cullin-RING E3 ligases and regulates their activity by hydrolyzing cullin-Nedd8 conjugates. Multiple alternatively spliced transcript variants have been found for this gene. NA
FKBP5 2289 FK506 binding protein 5 ENSG00000096060 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. NA
FAM129B 64855 family with sequence similarity 129 member B ENSG00000136830 NA NA
RP11-10C24.3 ENSG00000271643 NA ENSG00000271643 NA NA
LOC100505635 100505635 uncharacterized LOC100505635 ENSG00000235033 NA NA
IL1RAP 3556 interleukin 1 receptor accessory protein ENSG00000196083 Interleukin 1 induces synthesis of acute phase and proinflammatory proteins during infection, tissue damage, or stress, by forming a complex at the cell membrane with an interleukin 1 receptor and an accessory protein. This gene encodes the interleukin 1 receptor accessory protein. The protein is a necessary part of the interleukin 1 receptor complex which initiates signalling events that result in the activation of interleukin 1-responsive genes. Alternative splicing of this gene results in two transcript variants encoding two different isoforms, one membrane-bound and one soluble. The ratio of soluble to membrane-bound forms increases during acute-phase induction or stress. NA
NA NA NA ENSG00000255813 NA TRUE
RP11-343L5.2 ENSG00000271862 NA ENSG00000271862 NA NA
PKI55 150967 DKFZp434H1419 ENSG00000260804 NA NA
MTSS1 9788 metastasis suppressor 1 ENSG00000170873 NA NA
RP11-256L6.2 ENSG00000257715 NA ENSG00000257715 NA NA
CTSD 1509 cathepsin D ENSG00000117984 This gene encodes a member of the A1 family of peptidases. The encoded preproprotein is proteolytically processed to generate multiple protein products. These products include the cathepsin D light and heavy chains, which heterodimerize to form the mature enzyme. This enzyme exhibits pepsin-like activity and plays a role in protein turnover and in the proteolytic activation of hormones and growth factors. Mutations in this gene play a causal role in neuronal ceroid lipofuscinosis-10 and may be involved in the pathogenesis of several other diseases, including breast cancer and possibly Alzheimer’s disease. NA
FTH1P23 ENSG00000242960 ferritin, heavy polypeptide 1 pseudogene 23 ENSG00000242960 NA NA
DIRC2 84925 disrupted in renal carcinoma 2 ENSG00000138463 This gene encodes a membrane-bound protein from the major facilitator superfamily of transporters. Disruption of this gene by translocation has been associated with haplo-insufficiency and renal cell carcinomas. Alternatively spliced transcript variants have been described, but their biological validity has not yet been determined. NA
RP11-253I19.3 ENSG00000255670 NA ENSG00000255670 NA NA
CMTM6 54918 CKLF like MARVEL transmembrane domain containing 6 ENSG00000091317 This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and transmembrane 4 superfamilies. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 3. This gene is widely expressed in many tissues, but the exact function of the encoded protein is unknown. NA
PLEC 5339 plectin ENSG00000178209 Plectin is a prominent member of an important family of structurally and in part functionally related proteins, termed plakins or cytolinkers, that are capable of interlinking different elements of the cytoskeleton. Plakins, with their multi-domain structure and enormous size, not only play crucial roles in maintaining cell and tissue integrity and orchestrating dynamic changes in cytoarchitecture and cell shape, but also serve as scaffolding platforms for the assembly, positioning, and regulation of signaling complexes (reviewed in PMID: 9701547, 11854008, and 17499243). Plectin is expressed as several protein isoforms in a wide range of cell types and tissues from a single gene located on chromosome 8 in humans (PMID: 8633055, 8698233). Until 2010, this locus was named plectin 1 (symbol PLEC1 in human; Plec1 in mouse and rat) and the gene product had been referred to as ‘hemidesmosomal protein 1’ or ‘plectin 1, intermediate filament binding 500kDa’. These names were superseded by plectin. The plectin gene locus in mouse on chromosome 15 has been analyzed in detail (PMID: 10556294, 14559777), revealing a genomic exon-intron organization with well over 40 exons spanning over 62 kb and an unusual 5’ transcript complexity of plectin isoforms. Eleven exons (1-1j) have been identified that alternatively splice directly into a common exon 2 which is the first exon to encode plectin’s highly conserved actin binding domain (ABD). Three additional exons (-1, 0a, and 0) splice into an alternative first coding exon (1c), and two additional exons (2alpha and 3alpha) are optionally spliced within the exons encoding the acting binding domain (exons 2-8). Analysis of the human locus has identified eight of the eleven alternative 5’ exons found in mouse and rat (PMID: 14672974); exons 1i, 1j and 1h have not been confirmed in human. Furthermore, isoforms lacking the central rod domain encoded by exon 31 have been detected in mouse (PMID:10556294), rat (PMID: 9177781), and human (PMID: 11441066, 10780662, 20052759). The short alternative amino-terminal sequences encoded by the different first exons direct the targeting of the various isoforms to distinct subcellular locations (PMID: 14559777). As the expression of specific plectin isoforms was found to be dependent on cell type (tissue) and stage of development (PMID: 10556294, 12542521, 17389230) it appears that each cell type (tissue) contains a unique set (proportion and composition) of plectin isoforms, as if custom-made for specific requirements of the particular cells. Concordantly, individual isoforms were found to carry out distinct and specific functions (PMID: 14559777, 12542521, 18541706). In 1996, a number of groups reported that patients suffering from epidermolysis bullosa simplex with muscular dystrophy (EBS-MD) lacked plectin expression in skin and muscle tissues due to defects in the plectin gene (PMID: 8698233, 8941634, 8636409, 8894687, 8696340). Two other subtypes of plectin-related EBS have been described: EBS-pyloric atresia (PA) and EBS-Ogna. For reviews of plectin-related diseases see PMID: 15810881, 19945614. Mutations in the plectin gene related to human diseases should be named based on the position in NM_000445 (variant 1, isoform 1c), unless the mutation is located within one of the other alternative first exons, in which case the position in the respective Reference Sequence should be used. NA
CD82 3732 CD82 molecule ENSG00000085117 This metastasis suppressor gene product is a membrane glycoprotein that is a member of the transmembrane 4 superfamily. Expression of this gene has been shown to be downregulated in tumor progression of human cancers and can be activated by p53 through a consensus binding sequence in the promoter. Its expression and that of p53 are strongly correlated, and the loss of expression of these two proteins is associated with poor survival for prostate cancer patients. Two alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NA
ACVR1 90 activin A receptor type 1 ENSG00000115170 Activins are dimeric growth and differentiation factors which belong to the transforming growth factor-beta (TGF-beta) superfamily of structurally related signaling proteins. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I ( I and IB) and two type II (II and IIB) receptors. These receptors are all transmembrane proteins, composed of a ligand-binding extracellular domain with cysteine-rich region, a transmembrane domain, and a cytoplasmic domain with predicted serine/threonine specificity. Type I receptors are essential for signaling; and type II receptors are required for binding ligands and for expression of type I receptors. Type I and II receptors form a stable complex after ligand binding, resulting in phosphorylation of type I receptors by type II receptors. This gene encodes activin A type I receptor which signals a particular transcriptional response in concert with activin type II receptors. Mutations in this gene are associated with fibrodysplasia ossificans progressive. NA
NA NA NA ENSG00000256845 NA TRUE
UBXN2B 137886 UBX domain protein 2B ENSG00000215114 NA NA
SLC36A1 206358 solute carrier family 36 member 1 ENSG00000123643 This gene encodes a member of the eukaryote-specific amino acid/auxin permease (AAAP) 1 transporter family. The encoded protein functions as a proton-dependent, small amino acid transporter. This gene is clustered with related family members on chromosome 5q33.1. Alternative splicing results in multiple transcript variants. NA
MAVS 57506 mitochondrial antiviral signaling protein ENSG00000088888 This gene encodes an intermediary protein necessary in the virus-triggered beta interferon signaling pathways. It is required for activation of transcription factors which regulate expression of beta interferon and contributes to antiviral immunity. Multiple transcript variants encoding different isoforms have been found for this gene. NA
DAZAP2 9802 DAZ associated protein 2 ENSG00000183283 This gene encodes a proline-rich protein which interacts with the deleted in azoospermia (DAZ) and the deleted in azoospermia-like gene through the DAZ-like repeats. This protein also interacts with the transforming growth factor-beta signaling molecule SARA (Smad anchor for receptor activation), eukaryotic initiation factor 4G, and an E3 ubiquitinase that regulates its stability in splicing factor containing nuclear speckles. The encoded protein may function in various biological and pathological processes including spermatogenesis, cell signaling and transcription regulation, formation of stress granules during translation arrest, RNA splicing, and pathogenesis of multiple myeloma. Multiple transcript variants encoding different isoforms have been found for this gene. NA
RP11-809N8.4 ENSG00000256448 NA ENSG00000256448 NA NA
RP11-732A19.9 ENSG00000255680 NA ENSG00000255680 NA NA
OAT 4942 ornithine aminotransferase ENSG00000065154 This gene encodes the mitochondrial enzyme ornithine aminotransferase, which is a key enzyme in the pathway that converts arginine and ornithine into the major excitatory and inhibitory neurotransmitters glutamate and GABA. Mutations that result in a deficiency of this enzyme cause the autosomal recessive eye disease Gyrate Atrophy. Alternatively spliced transcript variants encoding different isoforms have been described. Related pseudogenes have been defined on the X chromosome. NA
TM9SF2 9375 transmembrane 9 superfamily member 2 ENSG00000125304 This gene encodes a member of the transmembrane 9 superfamily. The encoded 76 kDa protein localizes to early endosomes in human cells. The encoded protein possesses a conserved and highly hydrophobic C-terminal domain which contains nine transmembrane domains. The protein may play a role in small molecule transport or act as an ion channel. A pseudogene associated with this gene is located on the X chromosome. NA
FEZ2 9637 fasciculation and elongation protein zeta 2 ENSG00000171055 This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Other orthologs include the rat gene that encodes zygin II, which can bind to synaptotagmin. NA
RP5-1039K5.13 ENSG00000233739 NA ENSG00000233739 NA NA
KIF13A 63971 kinesin family member 13A ENSG00000137177 This gene encodes a member of the kinesin family of microtubule-based motor proteins that function in the positioning of endosomes. This family member can direct mannose-6-phosphate receptor-containing vesicles from the trans-Golgi network to the plasma membrane, and it is necessary for the steady-state distribution of late endosomes/lysosomes. It is also required for the translocation of FYVE-CENT and TTC19 from the centrosome to the midbody during cytokinesis, and it plays a role in melanosome maturation. Alternative splicing of this gene results in multiple transcript variants. NA
ATP9A 10079 ATPase phospholipid transporting 9A (putative) ENSG00000054793 NA NA
SOS2 6655 SOS Ras/Rho guanine nucleotide exchange factor 2 ENSG00000100485 NA NA
KIAA0232 9778 KIAA0232 ENSG00000170871 NA NA
C15orf52 388115 chromosome 15 open reading frame 52 ENSG00000188549 NA NA
RP11-596D21.1 ENSG00000257831 NA ENSG00000257831 NA NA
SIAE 54414 sialic acid acetylesterase ENSG00000110013 This gene encodes an enzyme which removes 9-O-acetylation modifications from sialic acids. Mutations in this gene are associated with susceptibility to autoimmune disease 6. Multiple transcript variants encoding different isoforms, found either in the cytosol or in the lysosome, have been found for this gene. NA
TMBIM6 7009 transmembrane BAX inhibitor motif containing 6 ENSG00000139644 NA NA
CDK5RAP2 55755 CDK5 regulatory subunit associated protein 2 ENSG00000136861 This gene encodes a regulator of CDK5 (cyclin-dependent kinase 5) activity. The protein encoded by this gene is localized to the centrosome and Golgi complex, interacts with CDK5R1 and pericentrin (PCNT), plays a role in centriole engagement and microtubule nucleation, and has been linked to primary microcephaly and Alzheimer’s disease. Alternative splicing results in multiple transcript variants. NA
SOX13 9580 SRY-box 13 ENSG00000143842 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. It has also been determined to be a type-1 diabetes autoantigen, also known as islet cell antibody 12. NA
RP11-58K22.5 ENSG00000254693 NA ENSG00000254693 NA NA
AVIL 10677 advillin ENSG00000135407 The protein encoded by this gene is a member of the gelsolin/villin family of actin regulatory proteins. This protein has structural similarity to villin. It binds actin and may play a role in the development of neuronal cells that form ganglia. NA
SECISBP2L 9728 SECIS binding protein 2 like ENSG00000138593 NA NA
AP000295.10 ENSG00000272659 NA ENSG00000272659 NA NA
CCPG1 9236 cell cycle progression 1 ENSG00000260916 NA NA
SQRDL 58472 sulfide quinone reductase-like (yeast) ENSG00000137767 The protein encoded by this gene may function in mitochondria to catalyze the conversion of sulfide to persulfides, thereby decreasing toxic concencrations of sulfide. Alternative splicing results in multiple transcript variants that encode the same protein. NA
ARHGEF11 9826 Rho guanine nucleotide exchange factor 11 ENSG00000132694 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The encoded protein may form a complex with G proteins and stimulate Rho-dependent signals. A similar protein in rat interacts with glutamate transporter EAAT4 and modulates its glutamate transport activity. Expression of the rat protein induces the reorganization of the actin cytoskeleton and its overexpression induces the formation of membrane ruffling and filopodia. Two alternative transcripts encoding different isoforms have been described. NA
CMTM4 146223 CKLF like MARVEL transmembrane domain containing 4 ENSG00000183723 This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and the transmembrane 4 superfamilies of signaling molecules. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 16. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
NA NA NA ENSG00000272091 NA TRUE
CD302 9936 CD302 molecule ENSG00000241399 CD302 is a C-type lectin receptor involved in cell adhesion and migration, as well as endocytosis and phagocytosis (Kato et al., 2007 [PubMed 17947679]). NA
FCHO2 115548 FCH domain only 2 ENSG00000157107 NA NA
HIP1 3092 huntingtin interacting protein 1 ENSG00000127946 The product of this gene is a membrane-associated protein that functions in clathrin-mediated endocytosis and protein trafficking within the cell. The encoded protein binds to the huntingtin protein in the brain; this interaction is lost in Huntington’s disease. Alternative splicing results in multiple transcript variants. NA
RP11-175O19.4 ENSG00000231025 NA ENSG00000231025 NA NA
ERGIC1 57222 endoplasmic reticulum-golgi intermediate compartment 1 ENSG00000113719 This gene encodes a cycling membrane protein which is an endoplasmic reticulum-golgi intermediate compartment (ERGIC) protein which interacts with other members of this protein family to increase their turnover. NA
NA NA NA ENSG00000203305 NA TRUE
RP11-1084A12.2 ENSG00000259468 NA ENSG00000259468 NA NA
TNFSF13 8741 tumor necrosis factor superfamily member 13 ENSG00000161955 The protein encoded by this gene is a member of the tumor necrosis factor (TNF) ligand family. This protein is a ligand for TNFRSF17/BCMA, a member of the TNF receptor family. This protein and its receptor are both found to be important for B cell development. In vitro experiments suggested that this protein may be able to induce apoptosis through its interaction with other TNF receptor family proteins such as TNFRSF6/FAS and TNFRSF14/HVEM. Alternative splicing results in multiple transcript variants. Some transcripts that skip the last exon of the upstream gene (TNFSF12) and continue into the second exon of this gene have been identified; such read-through transcripts are contained in GeneID 407977, TNFSF12-TNFSF13. NA
FTH1P2 ENSG00000234975 ferritin, heavy polypeptide 1 pseudogene 2 ENSG00000234975 NA NA
FTH1P7 ENSG00000232187 ferritin, heavy polypeptide 1 pseudogene 7 ENSG00000232187 NA NA
PSAP 5660 prosaposin ENSG00000197746 This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. NA
CNN2P1 ENSG00000227201 calponin 2 pseudogene 1 ENSG00000227201 NA NA
ENDOD1 23052 endonuclease domain containing 1 ENSG00000149218 NA NA
CTD-2049O4.1 ENSG00000261329 NA ENSG00000261329 NA NA
DSTYK 25778 dual serine/threonine and tyrosine protein kinase ENSG00000133059 This gene encodes a dual serine/threonine and tyrosine protein kinase which is expressed in multiple tissues. It is thought to function as a regulator of cell death. Multiple transcript variants encoding different isoforms have been found for this gene. NA
RP11-1000B6.3 ENSG00000261064 NA ENSG00000261064 NA NA
GLIPR1 11010 GLI pathogenesis related 1 ENSG00000139278 This gene encodes a protein with similarity to both the pathogenesis-related protein (PR) superfamily and the cysteine-rich secretory protein (CRISP) family. Increased expression of this gene is associated with myelomocytic differentiation in macrophage and decreased expression of this gene through gene methylation is associated with prostate cancer. The protein has proapoptotic activities in prostate and bladder cancer cells. This gene is a member of a cluster on chromosome 12 containing two other similar genes. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized. NA
FTH1 2495 ferritin heavy chain 1 ENSG00000167996 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. NA
TDP2 51567 tyrosyl-DNA phosphodiesterase 2 ENSG00000111802 This gene encodes a member of a superfamily of divalent cation-dependent phosphodiesterases. The encoded protein associates with CD40, tumor necrosis factor (TNF) receptor-75 and TNF receptor associated factors (TRAFs), and inhibits nuclear factor-kappa-B activation. This protein has sequence and structural similarities with APE1 endonuclease, which is involved in both DNA repair and the activation of transcription factors. NA
ADIPOR1 51094 adiponectin receptor 1 ENSG00000159346 This gene encodes a protein which acts as a receptor for adiponectin, a hormone secreted by adipocytes which regulates fatty acid catabolism and glucose levels. Binding of adiponectin to the encoded protein results in activation of an AMP-activated kinase signaling pathway which affects levels of fatty acid oxidation and insulin sensitivity. A pseudogene of this gene is located on chromosome 14. Multiple alternatively spliced transcript variants have been found for this gene. NA
CTD-2139B15.2 ENSG00000248223 NA ENSG00000248223 NA NA
CTC-429P9.5 ENSG00000267904 NA ENSG00000267904 NA NA
JKAMP 51528 JNK1/MAPK8-associated membrane protein ENSG00000050130 NA NA
ZNF641 121274 zinc finger protein 641 ENSG00000167528 NA NA
CTBS 1486 chitobiase ENSG00000117151 Chitobiase is a lysosomal glycosidase involved in degradation of asparagine-linked oligosaccharides on glycoproteins (Aronson and Kuranda, 1989 [PubMed 2531691]). NA
FBXL3 26224 F-box and leucine rich repeat protein 3 ENSG00000005812 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbls class and, in addition to an F-box, contains several tandem leucine-rich repeats and is localized in the nucleus. NA
RP3-510O8.4 ENSG00000232909 NA ENSG00000232909 NA NA
FAM21C 253725 family with sequence similarity 21 member C ENSG00000172661 NA NA
FBXO38 81545 F-box protein 38 ENSG00000145868 This gene encodes a large protein that contains an F-box domain and may participate in protein ubiquitination. The encoded protein is a transcriptional co-activator of Krueppel-like factor 7 (Klf7). A heterozygous mutation in this gene was found in individuals with autosomal dominant distal hereditary motor neuronopathy type IID. There is a pseudogene for this gene on chromosome 4. Alternative splicing results in multiple transcript variants. NA
MICU1 10367 mitochondrial calcium uptake 1 ENSG00000107745 This gene encodes an essential regulator of mitochondrial Ca2+ uptake under basal conditions. The encoded protein interacts with the mitochondrial calcium uniporter, a mitochondrial inner membrane Ca2+ channel, and is essential in preventing mitochondrial Ca2+ overload, which can cause excessive production of reactive oxygen species and cell stress. Alternatively spliced transcript variants encoding different isoforms have been described. NA
CKAP4 10970 cytoskeleton-associated protein 4 ENSG00000136026 NA NA
PSKH1 5681 protein serine kinase H1 ENSG00000159792 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query notfound X_id symbol summary name
ENSG00000271738 TRUE NA NA NA NA
ENSG00000272512 NA ENSG00000272512 RP11-54O7.17 NA NA
ENSG00000172201 NA 3400 ID4 This gene encodes a member of the inhibitor of DNA binding (ID) protein family. These proteins are basic helix-loop-helix transcription factors which can act as tumor suppressors but lack DNA binding activity. Consequently, the activity of the encoded protein depends on the protein binding partner. inhibitor of DNA binding 4, HLH protein
ENSG00000173175 NA 111 ADCY5 This gene encodes a member of the membrane-bound adenylyl cyclase enzymes. Adenylyl cyclases mediate G protein-coupled receptor signaling through the synthesis of the second messenger cAMP. Activity of the encoded protein is stimulated by the Gs alpha subunit of G protein-coupled receptors and is inhibited by protein kinase A, calcium and Gi alpha subunits. Single nucleotide polymorphisms in this gene may be associated with low birth weight and type 2 diabetes. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. adenylate cyclase 5
ENSG00000229732 NA ENSG00000229732 AC019349.5 NA NA
ENSG00000122378 NA 84293 FAM213A NA family with sequence similarity 213 member A
ENSG00000133401 NA 23037 PDZD2 The protein encoded by this gene contains six PDZ domains and shares sequence similarity with pro-interleukin-16 (pro-IL-16). Like pro-IL-16, the encoded protein localizes to the endoplasmic reticulum and is thought to be cleaved by a caspase to produce a secreted peptide containing two PDZ domains. In addition, this gene is upregulated in primary prostate tumors and may be involved in the early stages of prostate tumorigenesis. PDZ domain containing 2
ENSG00000198300 NA 5178 PEG3 In human, ZIM2 and PEG3 are treated as two distinct genes though they share multiple 5’ exons and a common promoter and both genes are paternally expressed (PMID:15203203). Alternative splicing events connect their shared 5’ exons either with the remaining 4 exons unique to ZIM2, or with the remaining 2 exons unique to PEG3. In contrast, in other mammals ZIM2 does not undergo imprinting and, in mouse, cow, and likely other mammals as well, the ZIM2 and PEG3 genes do not share exons. Human PEG3 protein belongs to the Kruppel C2H2-type zinc finger protein family. PEG3 may play a role in cell proliferation and p53-mediated apoptosis. PEG3 has also shown tumor suppressor activity and tumorigenesis in glioma and ovarian cells. Alternative splicing of this PEG3 gene results in multiple transcript variants encoding distinct isoforms. paternally expressed 3
ENSG00000157404 NA 3815 KIT This gene encodes the human homolog of the proto-oncogene c-kit. C-kit was first identified as the cellular homolog of the feline sarcoma viral oncogene v-kit. This protein is a type 3 transmembrane receptor for MGF (mast cell growth factor, also known as stem cell factor). Mutations in this gene are associated with gastrointestinal stromal tumors, mast cell disease, acute myelogenous lukemia, and piebaldism. Multiple transcript variants encoding different isoforms have been found for this gene. KIT proto-oncogene receptor tyrosine kinase
ENSG00000105088 NA 93145 OLFM2 NA olfactomedin 2
ENSG00000160145 NA 8997 KALRN Huntington’s disease (HD), a neurodegenerative disorder characterized by loss of striatal neurons, is caused by an expansion of a polyglutamine tract in the HD protein huntingtin. This gene encodes a protein that interacts with the huntingtin-associated protein 1, which is a huntingtin binding protein that may function in vesicle trafficking. kalirin, RhoGEF kinase
ENSG00000272678 NA ENSG00000272678 RP11-797D24.4 NA NA
ENSG00000163485 NA 134 ADORA1 The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene. adenosine A1 receptor
ENSG00000173898 NA 6712 SPTBN2 Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements. spectrin beta, non-erythrocytic 2
ENSG00000171766 NA 2628 GATM This gene encodes a mitochondrial enzyme that belongs to the amidinotransferase family. This enzyme is involved in creatine biosynthesis, whereby it catalyzes the transfer of a guanido group from L-arginine to glycine, resulting in guanidinoacetic acid, the immediate precursor of creatine. Mutations in this gene cause arginine:glycine amidinotransferase deficiency, an inborn error of creatine synthesis characterized by mental retardation, language impairment, and behavioral disorders. glycine amidinotransferase
ENSG00000025039 NA 58528 RRAGD RRAGD is a monomeric guanine nucleotide-binding protein, or G protein. By binding GTP or GDP, small G proteins act as molecular switches in numerous cell processes and signaling pathways. Ras related GTP binding D
ENSG00000182902 NA 83733 SLC25A18 NA solute carrier family 25 member 18
ENSG00000103034 NA 65009 NDRG4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. NDRG family member 4
ENSG00000149809 NA 7108 TM7SF2 NA transmembrane 7 superfamily member 2
ENSG00000162373 NA 79656 BEND5 NA BEN domain containing 5
ENSG00000121690 NA 91614 DEPDC7 NA DEP domain containing 7
ENSG00000163209 NA 6707 SPRR3 NA small proline rich protein 3
ENSG00000260244 NA ENSG00000260244 RP11-588K22.2 NA NA
ENSG00000136237 NA 9771 RAPGEF5 Members of the RAS (see HRAS; MIM 190020) subfamily of GTPases function in signal transduction as GTP/GDP-regulated switches that cycle between inactive GDP- and active GTP-bound states. Guanine nucleotide exchange factors (GEFs), such as RAPGEF5, serve as RAS activators by promoting acquisition of GTP to maintain the active GTP-bound state and are the key link between cell surface receptors and RAS activation (Rebhun et al., 2000 [PubMed 10934204]). Rap guanine nucleotide exchange factor 5
ENSG00000204677 NA ENSG00000204677 FAM153C NA family with sequence similarity 153 member C
ENSG00000236609 NA 54753 ZNF853 NA zinc finger protein 853
ENSG00000169509 NA 54544 CRCT1 NA cysteine rich C-terminal 1
ENSG00000163864 NA 349565 NMNAT3 This gene encodes a member of the nicotinamide/nicotinic acid mononucleotide adenylyltransferase family. These enzymes use ATP to catalyze the synthesis of nicotinamide adenine dinucleotide or nicotinic acid adenine dinucleotide from nicotinamide mononucleotide or nicotinic acid mononucleotide, respectively. The encoded protein is localized to mitochondria and may also play a neuroprotective role as a molecular chaperone. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. nicotinamide nucleotide adenylyltransferase 3
ENSG00000268358 TRUE NA NA NA NA
ENSG00000125378 NA 652 BMP4 This gene encodes a member of the bone morphogenetic protein (BMP) family of proteins, which is part of the transforming growth factor-beta (TGF-beta) superfamily. Members of the BMP family play an important role in bone and cartilage development. The encoded preproprotein is proteolytically processed to generate each subunit of the disulfide-linked homodimer. Mutations in this gene are associated with orofacial cleft and microphthalmia in human patients. The encoded protein may also be involved in the pathology of multiple cardiovascular diseases and human cancers. Alternative splicing results in multiple transcript variants. bone morphogenetic protein 4
ENSG00000125780 NA 7053 TGM3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. transglutaminase 3
ENSG00000182985 NA 23705 CADM1 NA cell adhesion molecule 1
ENSG00000164116 NA 2982 GUCY1A3 Soluble guanylate cyclases are heterodimeric proteins that catalyze the conversion of GTP to 3’,5’-cyclic GMP and pyrophosphate. The protein encoded by this gene is an alpha subunit of this complex and it interacts with a beta subunit to form the guanylate cyclase enzyme, which is activated by nitric oxide. Several transcript variants encoding a few different isoforms have been found for this gene. guanylate cyclase 1, soluble, alpha 3
ENSG00000259933 NA ENSG00000259933 RP11-304L19.1 NA NA
ENSG00000182230 NA 202134 FAM153B NA family with sequence similarity 153 member B
ENSG00000182230 NA 100507387 LOC100507387 NA uncharacterized LOC100507387
ENSG00000186998 NA 129080 EMID1 NA EMI domain containing 1
ENSG00000106772 NA 158471 PRUNE2 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. prune homolog 2
ENSG00000124225 NA 56937 PMEPA1 This gene encodes a transmembrane protein that contains a Smad interacting motif (SIM). Expression of this gene is induced by androgens and transforming growth factor beta, and the encoded protein suppresses the androgen receptor and transforming growth factor beta signaling pathways though interactions with Smad proteins. Overexpression of this gene may play a role in multiple types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. prostate transmembrane protein, androgen induced 1
ENSG00000134121 NA 10752 CHL1 The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. cell adhesion molecule L1 like
ENSG00000135423 NA 27165 GLS2 The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms. glutaminase 2
ENSG00000154330 NA 5239 PGM5 Phosphoglucomutases (EC 5.2.2.2.), such as PGM5, are phosphotransferases involved in interconversion of glucose-1-phosphate and glucose-6-phosphate. PGM activity is essential in formation of carbohydrates from glucose-6-phosphate and in formation of glucose-6-phosphate from galactose and glycogen (Edwards et al., 1995 [PubMed 8586438]). phosphoglucomutase 5
ENSG00000140451 NA 80119 PIF1 This gene encodes a DNA-dependent adenosine triphosphate (ATP)-metabolizing enzyme that functions as a 5’ to 3’ DNA helicase. The encoded protein can resolve G-quadruplex structures and RNA-DNA hybrids at the ends of chromosomes. It also prevents telomere elongation by inhibiting the actions of telomerase. Alternative splicing and the use of alternative start codons results in multiple isoforms that are differentially localized to either the mitochondria or the nucleus. PIF1 5’-to-3’ DNA helicase
ENSG00000142178 NA 150094 SIK1 NA salt inducible kinase 1
ENSG00000130787 NA 9026 HIP1R NA huntingtin interacting protein 1 related
ENSG00000130702 NA 3911 LAMA5 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). laminin subunit alpha 5
ENSG00000008710 NA 5310 PKD1 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. polycystin 1, transient receptor potential channel interacting
ENSG00000205336 NA 9289 ADGRG1 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. adhesion G protein-coupled receptor G1
ENSG00000169758 NA 123591 TMEM266 NA transmembrane protein 266
ENSG00000184674 TRUE NA NA NA NA
ENSG00000169116 NA 25849 PARM1 NA prostate androgen-regulated mucin-like protein 1
ENSG00000188732 NA 340277 FAM221A NA family with sequence similarity 221 member A
ENSG00000101096 NA 4773 NFATC2 This gene is a member of the nuclear factor of activated T cells (NFAT) family. The product of this gene is a DNA-binding protein with a REL-homology region (RHR) and an NFAT-homology region (NHR). This protein is present in the cytosol and only translocates to the nucleus upon T cell receptor (TCR) stimulation, where it becomes a member of the nuclear factors of activated T cells transcription complex. This complex plays a central role in inducing gene transcription during the immune response. Alternate transcriptional splice variants encoding different isoforms have been characterized. nuclear factor of activated T-cells 2
ENSG00000231584 NA ENSG00000231584 FAHD2CP NA fumarylacetoacetate hydrolase domain containing 2C, pseudogene
ENSG00000171772 NA 93426 SYCE1 NA synaptonemal complex central element protein 1
ENSG00000171401 NA 3860 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13
ENSG00000039068 NA 999 CDH1 This gene encodes a classical cadherin of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature glycoprotein. This calcium-dependent cell-cell adhesion protein is comprised of five extracellular cadherin repeats, a transmembrane region and a highly conserved cytoplasmic tail. Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis. The ectodomain of this protein mediates bacterial adhesion to mammalian cells and the cytoplasmic domain is required for internalization. This gene is present in a gene cluster with other members of the cadherin family on chromosome 16. cadherin 1
ENSG00000136848 NA 153090 DAB2IP DAB2IP is a Ras (MIM 190020) GTPase-activating protein (GAP) that acts as a tumor suppressor. The DAB2IP gene is inactivated by methylation in prostate and breast cancers (Yano et al., 2005 [PubMed 15386433]). DAB2 interacting protein
ENSG00000183779 NA 80139 ZNF703 NA zinc finger protein 703
ENSG00000152583 NA 8404 SPARCL1 NA SPARC like 1
ENSG00000120278 NA 57480 PLEKHG1 NA pleckstrin homology and RhoGEF domain containing G1
ENSG00000099282 NA 23555 TSPAN15 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The use of alternate polyadenylation sites has been found for this gene. tetraspanin 15
ENSG00000198719 NA 28514 DLL1 DLL1 is a human homolog of the Notch Delta ligand and is a member of the delta/serrate/jagged family. It plays a role in mediating cell fate decisions during hematopoiesis. It may play a role in cell-to-cell communication. delta like canonical Notch ligand 1
ENSG00000101447 NA 81610 FAM83D NA family with sequence similarity 83 member D
ENSG00000257026 TRUE NA NA NA NA
ENSG00000136002 NA 50649 ARHGEF4 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The protein encoded by this gene may form complex with G proteins and stimulate Rho-dependent signals. Multiple alternatively spliced transcript variants encoding different isoforms have been found, but the full-length nature of some variants has not been determined. Rho guanine nucleotide exchange factor 4
ENSG00000135744 NA 183 AGT The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. angiotensinogen
ENSG00000154721 NA 58494 JAM2 This gene belongs to the immunoglobulin superfamily, and the junctional adhesion molecule (JAM) family. The protein encoded by this gene is a type I membrane protein that is localized at the tight junctions of both epithelial and endothelial cells. It acts as an adhesive ligand for interacting with a variety of immune cell types, and may play a role in lymphocyte homing to secondary lymphoid organs. Alternatively spliced transcript variants have been found for this gene. junctional adhesion molecule 2
ENSG00000167191 NA 51704 GPRC5B This gene encodes a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The encoded protein may modulate insulin secretion and increased protein expression is associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. G protein-coupled receptor class C group 5 member B
ENSG00000206535 NA 348801 LNP1 NA leukemia NUP98 fusion partner 1
ENSG00000137269 NA 55227 LRRC1 NA leucine rich repeat containing 1
ENSG00000108852 NA 4355 MPP2 Palmitoylated membrane protein 2 is a member of a family of membrane-associated proteins termed MAGUKs (membrane-associated guanylate kinase homologs). MAGUKs interact with the cytoskeleton and regulate cell proliferation, signaling pathways, and intracellular junctions. Palmitoylated membrane protein 2 contains a conserved sequence, called the SH3 (src homology 3) motif, found in several other proteins that associate with the cytoskeleton and are suspected to play important roles in signal transduction. membrane palmitoylated protein 2
ENSG00000271218 NA ENSG00000271218 RP3-523E19.2 NA NA
ENSG00000229953 NA ENSG00000229953 RP11-284F21.7 NA NA
ENSG00000268751 NA 643719 SCGB1B2P NA secretoglobin family 1B member 2, pseudogene
ENSG00000179057 NA 283284 IGSF22 NA immunoglobulin superfamily member 22
ENSG00000186260 NA 57496 MKL2 NA MKL1/myocardin like 2
ENSG00000081803 NA 93664 CADPS2 This gene encodes a member of the calcium-dependent activator of secretion (CAPS) protein family, which are calcium binding proteins that regulate the exocytosis of synaptic and dense-core vesicles in neurons and neuroendocrine cells. Mutations in this gene may contribute to autism susceptibility. Multiple transcript variants encoding different isoforms have been found for this gene. Ca2+ dependent secretion activator 2
ENSG00000272468 NA ENSG00000272468 RP1-86C11.7 NA NA
ENSG00000003147 NA 3382 ICA1 This gene encodes a protein with an arfaptin homology domain that is found both in the cytosol and as membrane-bound form on the Golgi complex and immature secretory granules. This protein is believed to be an autoantigen in insulin-dependent diabetes mellitus and primary Sjogren’s syndrome. Several transcript variants encoding two different isoforms have been found for this gene. islet cell autoantigen 1
ENSG00000111879 NA 79632 FAM184A NA family with sequence similarity 184 member A
ENSG00000156968 NA 255027 MPV17L NA MPV17 mitochondrial inner membrane protein like
ENSG00000161835 NA 160622 GRASP This gene encodes a protein that functions as a molecular scaffold, linking receptors, including group 1 metabotropic glutamate receptors, to neuronal proteins. The encoded protein contains conserved domains, including a leucine zipper sequence, PDZ domain and a C-terminal PDZ-binding motif. Alternately spliced transcript variants have been observed for this gene. GRP1 (general receptor for phosphoinositides 1)-associated scaffold protein
ENSG00000156113 NA 3778 KCNMA1 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. potassium calcium-activated channel subfamily M alpha 1
ENSG00000162438 NA 11330 CTRC This gene encodes a member of the peptidase S1 family. The encoded protein is a serum calcium-decreasing factor that has chymotrypsin-like protease activity. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. chymotrypsin C
ENSG00000063180 NA 770 CA11 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system. carbonic anhydrase 11
ENSG00000172264 NA 140733 MACROD2 NA MACRO domain containing 2
ENSG00000186352 NA 353322 ANKRD37 NA ankyrin repeat domain 37
ENSG00000136160 NA 1910 EDNRB The protein encoded by this gene is a G protein-coupled receptor which activates a phosphatidylinositol-calcium second messenger system. Its ligand, endothelin, consists of a family of three potent vasoactive peptides: ET1, ET2, and ET3. Studies suggest that the multigenic disorder, Hirschsprung disease type 2, is due to mutations in the endothelin receptor type B gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. endothelin receptor type B
ENSG00000129595 NA 64097 EPB41L4A Members of the band 4.1 protein superfamily, including EPB41L4A, are thought to regulate the interaction between the cytoskeleton and plasma membrane (Ishiguro et al., 2000 [PubMed 10874211]). erythrocyte membrane protein band 4.1 like 4A
ENSG00000215845 NA 100131187 TSTD1 NA thiosulfate sulfurtransferase like domain containing 1
ENSG00000106123 NA 2051 EPHB6 This gene encodes a member of a family of transmembrane proteins that function as receptors for ephrin-B family proteins. Unlike other members of this family, the encoded protein does not contain a functional kinase domain. Activity of this protein can influence cell adhesion and migration. Expression of this gene is downregulated during tumor progression, suggesting that the protein may suppress tumor invasion and metastasis. Alternative splicing results in multiple transcript variants. EPH receptor B6
ENSG00000188763 NA 8326 FZD9 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD9 gene is located within the Williams syndrome common deletion region of chromosome 7, and heterozygous deletion of the FZD9 gene may contribute to the Williams syndrome phenotype. FZD9 is expressed predominantly in brain, testis, eye, skeletal muscle, and kidney. frizzled class receptor 9
ENSG00000143536 NA 49860 CRNN This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. cornulin
ENSG00000072201 NA 84708 LNX1 This gene encodes a membrane-bound protein that is involved in signal transduction and protein interactions. The encoded product is an E3 ubiquitin-protein ligase, which mediates ubiquitination and subsequent proteasomal degradation of proteins containing phosphotyrosine binding (PTB) domains. This protein may play an important role in tumorogenesis. Alternatively spliced transcript variants encoding distinct isoforms have been described. A pseudogene, which is located on chromosome 17, has been identified for this gene. ligand of numb-protein X 1
ENSG00000171159 NA 79095 C9orf16 NA chromosome 9 open reading frame 16
ENSG00000183049 NA 57118 CAMK1D This gene is a member of the calcium/calmodulin-dependent protein kinase 1 family, a subfamily of the serine/threonine kinases. The encoded protein is a component of the calcium-regulated calmodulin-dependent protein kinase cascade. It has been associated with multiple processes including regulation of granulocyte function, activation of CREB-dependent gene transcription, aldosterone synthesis, differentiation and activation of neutrophil cells, and apoptosis of erythroleukemia cells. Alternatively spliced transcript variants encoding different isoforms of this gene have been described. calcium/calmodulin dependent protein kinase ID
ENSG00000076554 NA 7163 TPD52 NA tumor protein D52
ENSG00000188779 NA 390598 SKOR1 NA SKI family transcriptional corepressor 1
ENSG00000162772 NA 467 ATF3 This gene encodes a member of the mammalian activation transcription factor/cAMP responsive element-binding (CREB) protein family of transcription factors. This gene is induced by a variety of signals, including many of those encountered by cancer cells, and is involved in the complex process of cellular stress response. Multiple transcript variants encoding different isoforms have been found for this gene. It is possible that alternative splicing of this gene may be physiologically important in the regulation of target genes. activating transcription factor 3
ENSG00000261113 NA ENSG00000261113 RP11-141O15.1 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query notfound
This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 REG1A regenerating family member 1 alpha ENSG00000115386 NA
NA NA NA NA ENSG00000165862 TRUE
This gene encodes a protein that is a member of the dickkopf family. It is a secreted protein with two cysteine rich regions and is involved in embryonic development through its inhibition of the WNT signaling pathway. Elevated levels of DKK1 in bone marrow plasma and peripheral blood is associated with the presence of osteolytic bone lesions in patients with multiple myeloma. 22943 DKK1 dickkopf WNT signaling pathway inhibitor 1 ENSG00000107984 NA
NA 123036 TC2N tandem C2 domains, nuclear ENSG00000165929 NA
NA 132299 OCIAD2 OCIA domain containing 2 ENSG00000145247 NA
The protein encoded by this gene is a serine/threonine kinase that may be involved in the regulation of chromatin assembly. The encoded protein is only active when it is phosphorylated, and this phosphorylation is cell cycle-dependent, with the maximal activity of this protein coming during S phase. The catalytic activity of this protein is diminished by DNA damage and by blockage of DNA replication. Three transcript variants encoding different isoforms have been found for this gene. 9874 TLK1 tousled like kinase 1 ENSG00000198586 NA
The 26S proteasome is a multicatalytic proteinase complex with a highly ordered structure composed of 2 complexes, a 20S core and a 19S regulator. The 20S core is composed of 4 rings of 28 non-identical subunits; 2 rings are composed of 7 alpha subunits and 2 rings are composed of 7 beta subunits. The 19S regulator is composed of a base, which contains 6 ATPase subunits and 2 non-ATPase subunits, and a lid, which contains up to 10 non-ATPase subunits. Proteasomes are distributed throughout eukaryotic cells at a high concentration and cleave peptides in an ATP/ubiquitin-dependent process in a non-lysosomal pathway. An essential function of a modified proteasome, the immunoproteasome, is the processing of class I MHC peptides. The immunoproteasome contains an alternate regulator, referred to as the 11S regulator or PA28, that replaces the 19S regulator. Three subunits (alpha, beta and gamma) of the 11S regulator have been identified. This gene encodes the beta subunit of the 11S regulator, one of the two 11S subunits that is induced by gamma-interferon. Three beta and three alpha subunits combine to form a heterohexameric ring. Six pseudogenes have been identified on chromosomes 4, 5, 8, 10 and 13. 5721 PSME2 proteasome activator subunit 2 ENSG00000100911 NA
This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 PRSS1 protease, serine 1 ENSG00000204983 NA
The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is required for generation and long-term maintenance of T cell immunity. It binds to ligand CD70, and plays a key role in regulating B-cell activation and immunoglobulin synthesis. This receptor transduces signals that lead to the activation of NF-kappaB and MAPK8/JNK. Adaptor proteins TRAF2 and TRAF5 have been shown to mediate the signaling process of this receptor. CD27-binding protein (SIVA), a proapoptotic protein, can bind to this receptor and is thought to play an important role in the apoptosis induced by this receptor. 939 CD27 CD27 molecule ENSG00000139193 NA
NA 122618 PLD4 phospholipase D family member 4 ENSG00000166428 NA
Members of the F-box protein family, such as FBXO46, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 23403 FBXO46 F-box protein 46 ENSG00000177051 NA
NA ENSG00000233849 AC022201.5 NA ENSG00000233849 NA
NA 6119 RPA3 replication protein A3 ENSG00000106399 NA
This gene encodes a protein containing three zinc finger domains and a nuclear localization signal. The mRNA and the protein of this gene are upregulated by wildtype p53 and overexpression of this gene inhibits tumor cell growth, suggesting that this gene may have a role in the p53-dependent growth regulatory pathway. Alternative splicing of this gene results in two transcript variants encoding two isoforms differing in only one amino acid. 64393 ZMAT3 zinc finger matrin-type 3 ENSG00000172667 NA
SAS6 is necessary for centrosome duplication and functions during procentriole formation; SAS6 functions to ensure that each centriole seeds the formation of a single procentriole per cell cycle Strnad et al., (2007) [PubMed 17681132]. 163786 SASS6 SAS-6 centriolar assembly protein ENSG00000156876 NA
NA 117584 RFFL ring finger and FYVE-like domain containing E3 ubiquitin protein ligase ENSG00000092871 NA
This gene encodes the medium-chain specific (C4 to C12 straight chain) acyl-Coenzyme A dehydrogenase. The homotetramer enzyme catalyzes the initial step of the mitochondrial fatty acid beta-oxidation pathway. Defects in this gene cause medium-chain acyl-CoA dehydrogenase deficiency, a disease characterized by hepatic dysfunction, fasting hypoglycemia, and encephalopathy, which can result in infantile death. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 34 ACADM acyl-CoA dehydrogenase, C-4 to C-12 straight chain ENSG00000117054 NA
NA ENSG00000237950 RP11-7O11.3 NA ENSG00000237950 NA
NA ENSG00000183444 OR7E38P olfactory receptor family 7 subfamily E member 38 pseudogene ENSG00000183444 NA
NA 11179 ZNF277 zinc finger protein 277 ENSG00000198839 NA
The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein can be activated by various ligands including interferon-alpha, interferon-gamma, EGF, PDGF and IL6. This protein mediates the expression of a variety of genes, which is thought to be important for cell viability in response to different cell stimuli and pathogens. Two alternatively spliced transcript variants encoding distinct isoforms have been described. 6772 STAT1 signal transducer and activator of transcription 1 ENSG00000115415 NA
NA ENSG00000230177 RP5-1112D6.4 NA ENSG00000230177 NA
This gene encodes a basic, proline-rich, 15-kD protein. The protein acts as a positive mediator of programmed cell death that is induced by interferon-gamma. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. 1611 DAP death-associated protein ENSG00000112977 NA
The protein encoded by this gene is a choline dehydrogenase that localizes to the mitochondrion. Variations in this gene can affect susceptibility to choline deficiency. A few transcript variants have been found for this gene, but the full-length nature of only one has been characterized to date. 55349 CHDH choline dehydrogenase ENSG00000016391 NA
This gene encodes a member of the ATP-dependent DNA ligase protein family. The encoded protein functions in DNA replication, recombination, and the base excision repair process. Mutations in this gene that lead to DNA ligase I deficiency result in immunodeficiency and increased sensitivity to DNA-damaging agents. Disruption of this gene may also be associated with a variety of cancers. Alternative splicing results in multiple transcript variants. 3978 LIG1 DNA ligase 1 ENSG00000105486 NA
The protein encoded by this gene belongs to the biotin and lipoic acid synthetases family. It localizes in mitochondrion and plays an important role in alpha-(+)-lipoic acid synthesis. It may also function in the sulfur insertion chemistry in lipoate biosynthesis. Alternative splicing occurs at this locus and two transcript variants encoding distinct isoforms have been identified. 11019 LIAS lipoic acid synthetase ENSG00000121897 NA
While the exact function of the protein encoded by this gene is not known, it belongs to the 5’(3’)-deoxyribonucleotidase family. 221294 NT5DC1 5’-nucleotidase domain containing 1 ENSG00000178425 NA
NA ENSG00000203644 RP11-332M2.1 NA ENSG00000203644 NA
CENPQ is a subunit of a CENPH (MIM 605607)-CENPI (MIM 300065)-associated centromeric complex that targets CENPA (MIM 117139) to centromeres and is required for proper kinetochore function and mitotic progression (Okada et al., 2006 [PubMed 16622420]). 55166 CENPQ centromere protein Q ENSG00000031691 NA
NA 84333 PCGF5 polycomb group ring finger 5 ENSG00000180628 NA
This gene encodes a membrane-associated enzyme located at a branch point in the mevalonate pathway. The encoded protein is the first specific enzyme in cholesterol biosynthesis, catalyzing the dimerization of two molecules of farnesyl diphosphate in a two-step reaction to form squalene. 2222 FDFT1 farnesyl-diphosphate farnesyltransferase 1 ENSG00000079459 NA
This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This protein does not contain a coiled coil region, like some family members. This gene encodes a protein of unknown function. This gene results in two transcript variants differing in the 5’ UTR, but encoding the same protein. 29916 SNX11 sorting nexin 11 ENSG00000002919 NA
NA ENSG00000236326 RP3-486I3.5 NA ENSG00000236326 NA
The protein encoded by this gene is a nuclear transcriptional co-activator for peroxisome proliferator activated receptor alpha. The encoded protein contains a zinc finger and is a helicase that appears to be part of the peroxisome proliferator activated receptor alpha interacting complex. This gene is a member of the DNA2/NAM7 helicase gene family. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 85441 HELZ2 helicase with zinc finger 2 ENSG00000130589 NA
NA 121441 NEDD1 neural precursor cell expressed, developmentally down-regulated 1 ENSG00000139350 NA
HSD17B7 encodes an enzyme that functions both as a 17-beta-hydroxysteroid dehydrogenase (EC 1.1.1.62) in the biosynthesis of sex steroids and as a 3-ketosteroid reductase (EC 1.1.1.270) in the biosynthesis of cholesterol (Marijanovic et al., 2003 [PubMed 12829805]). 51478 HSD17B7 hydroxysteroid 17-beta dehydrogenase 7 ENSG00000132196 NA
Protein geranylgeranyltransferase type I (GGTase-I) transfers a geranylgeranyl group to the cysteine residue of candidate proteins containing a C-terminal CAAX motif in which ‘A’ is an aliphatic amino acid and ‘X’ is leucine (summarized by Zhang et al., 1994 [PubMed 8106351]). The enzyme is composed of a 48-kD alpha subunit (FNTA; MIM 134635) and a 43-kD beta subunit, encoded by the PGGT1B gene. The FNTA gene encodes the alpha subunit for both GGTase-I and the related enzyme farnesyltransferase. 5229 PGGT1B protein geranylgeranyltransferase type I subunit beta ENSG00000164219 NA
NA ENSG00000213621 RPSAP54 ribosomal protein SA pseudogene 54 ENSG00000213621 NA
This gene, a member of the histidine triad gene family, encodes a diadenosine 5’,5’’’-P1,P3-triphosphate hydrolase involved in purine metabolism. The gene encompasses the common fragile site FRA3B on chromosome 3, where carcinogen-induced damage can lead to translocations and aberrant transcripts of this gene. In fact, aberrant transcripts from this gene have been found in about half of all esophageal, stomach, and colon carcinomas. Alternatively spliced transcript variants have been found for this gene. 2272 FHIT fragile histidine triad ENSG00000189283 NA
NA 23306 NEMP1 nuclear envelope integral membrane protein 1 ENSG00000166881 NA
During the initiation of protein biosynthesis, initiation factor-2 (IF-2) promotes the binding of the initiator tRNA to the small subunit of the ribosome in a GTP-dependent manner. Prokaryotic IF-2 is a single polypeptide, while eukaryotic cytoplasmic IF-2 (eIF-2) is a trimeric protein. Bovine liver mitochondria contain IF-2(mt), an 85-kD monomeric protein that is equivalent to prokaryotic IF-2. The predicted 727-amino acid human protein contains a 29-amino acid presequence. Human IF-2(mt) shares 32 to 38% amino acid sequence identity with yeast IF-2(mt) and several prokaryotic IF-2s, with the greatest degree of conservation in the G domains of the proteins. 4528 MTIF2 mitochondrial translational initiation factor 2 ENSG00000085760 NA
NA ENSG00000229931 RP1-151F17.1 NA ENSG00000229931 NA
NA 79603 CERS4 ceramide synthase 4 ENSG00000090661 NA
The protein encoded by this gene is a subunit of the propionyl-CoA carboxylase (PCC) enzyme, which is involved in the catabolism of propionyl-CoA. PCC is a mitochondrial enzyme that probably acts as a dodecamer of six alpha subunits and six beta subunits. This gene encodes the beta subunit of PCC. Defects in this gene are a cause of propionic acidemia type II (PA-2). Multiple transcript variants encoding different isoforms have been found for this gene. 5096 PCCB propionyl-CoA carboxylase beta subunit ENSG00000114054 NA
This gene encodes a protein that is involved as a negative regulator of GSK3-beta in the Wnt signaling pathway. The encoded protein may play a role in the retinoic acid signaling pathway by regulating the functional interactions between GSK3-beta, beta-catenin and cyclin D1, and it regulates the beta-catenin/N-cadherin pool. The encoded protein contains a GSK3-beta interacting domain (GID) in its C-terminus, which is similar to the GID of Axin. The protein also contains an evolutionarily conserved RII-binding domain, which facilitates binding with protein kinase-A and GSK3-beta, enabling its role as an A-kinase anchoring protein. Alternatively spliced transcript variants have been observed for this gene. 51527 GSKIP GSK3B interacting protein ENSG00000100744 NA
This gene encodes a kinetochore protein that functions as part of the minichromosome instability-12 centromere complex. The encoded protein is required for proper kinetochore assembly and progression through the cell cycle. Alternative splicing results in multiple transcript variants. 79980 DSN1 DSN1 homolog, MIS12 kinetochore complex component ENSG00000149636 NA
This gene encodes a protein that associates with the enzyme phosphoribosylpyrophosphate synthetase (PRS). PRS catalyzes the formation of phosphoribosylpyrophosphate which is a substrate for synthesis of purine and pyrimidine nucleotides, histidine, tryptophan and NAD. PRS exists as a complex with two catalytic subunits and two associated subunits. This gene encodes a non-catalytic associated subunit of PRS. Alternate splicing results in multiple transcript variants. 5636 PRPSAP2 phosphoribosyl pyrophosphate synthetase associated protein 2 ENSG00000141127 NA
This gene encodes a member of the 5’-nucleotidase family of enzymes that catalyze the dephosphorylation of nucleoside 5’-monophosphates. The encoded protein is the type 1 isozyme of pyrimidine 5’ nucleotidase and catalyzes the dephosphorylation of pyrimidine 5’ monophosphates. Mutations in this gene are a cause of hemolytic anemia due to uridine 5-prime monophosphate hydrolase deficiency. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and pseudogenes of this gene are located on the long arm of chromosomes 3 and 4. 51251 NT5C3A 5’-nucleotidase, cytosolic IIIA ENSG00000122643 NA
NA NA NA NA ENSG00000233137 TRUE
The protein encoded by this gene is a dimethyltransferase that methylates the conserved stem loop of mitochondrial 12S rRNA. The encoded protein also is part of the basal mitochondrial transcription complex and is necessary for mitochondrial gene expression. The methylation and transcriptional activities of this protein are independent of one another. Variations in this gene may influence the severity of aminoglycoside-induced deafness (AID). 51106 TFB1M transcription factor B1, mitochondrial ENSG00000029639 NA
NA 26148 C10orf12 chromosome 10 open reading frame 12 ENSG00000155640 NA
NA ENSG00000216895 AC009403.2 NA ENSG00000216895 NA
NA 283643 C14orf80 chromosome 14 open reading frame 80 ENSG00000185347 NA
NA 57001 SDHAF3 succinate dehydrogenase complex assembly factor 3 ENSG00000196636 NA
NA ENSG00000261684 RP11-265N6.1 NA ENSG00000261684 NA
NA 339745 SPOPL speckle type BTB/POZ protein like ENSG00000144228 NA
The protein encoded by this gene is a death domain containing adaptor molecule that interacts with TNFRSF1A/TNFR1 and mediates programmed cell death signaling and NF-kappaB activation. This protein binds adaptor protein TRAF2, reduces the recruitment of inhibitor-of-apoptosis proteins (IAPs) by TRAF2, and thus suppresses TRAF2 mediated apoptosis. This protein can also interact with receptor TNFRSF6/FAS and adaptor protein FADD/MORT1, and is involved in the Fas-induced cell death pathway. 8717 TRADD TNFRSF1A associated via death domain ENSG00000102871 NA
NA NA NA NA ENSG00000129282 TRUE
Pseudouridination, the isomerization of uridine to pseudouridine, is the most common posttranscriptional nucleotide modification found in RNA and is essential for biologic functions such as spliceosome biogenesis. Pseudouridylate synthases, such as PUS10, catalyze pseudouridination of structural RNAs, including transfer, ribosomal, and splicing RNAs. These enzymes also act as RNA chaperones, facilitating the correct folding and assembly of tRNAs (McCleverty et al., 2007 [PubMed 17900615]). 150962 PUS10 pseudouridylate synthase 10 ENSG00000162927 NA
NA 10362 HMG20B high mobility group 20B ENSG00000064961 NA
NA 195828 ZNF367 zinc finger protein 367 ENSG00000165244 NA
NA 100131187 TSTD1 thiosulfate sulfurtransferase like domain containing 1 ENSG00000215845 NA
NA 168374 ZNF92 zinc finger protein 92 ENSG00000146757 NA
The protein encoded by this gene belongs to the class-II aminoacyl-tRNA synthetase family. It is a mitochondrial enzyme that specifically aminoacylates aspartyl-tRNA. Mutations in this gene are associated with leukoencephalopathy with brainstem and spinal cord involvement and lactate elevation (LBSL). 55157 DARS2 aspartyl-tRNA synthetase 2, mitochondrial ENSG00000117593 NA
NA ENSG00000223551 TMSB4XP4 thymosin beta 4, X-linked pseudogene 4 ENSG00000223551 NA
This gene encodes an enzyme that plays a major role in polyamine metabolism and is important for the salvage of both adenine and methionine. The encoded enzyme is deficient in many cancers because this gene and the tumor suppressor p16 gene are co-deleted. Multiple alternatively spliced transcript variants have been described for this gene, but their full-length natures remain unknown. 4507 MTAP methylthioadenosine phosphorylase ENSG00000099810 NA
NA ENSG00000212789 ST13P5 suppression of tumorigenicity 13 (colon carcinoma) (Hsp70 interacting protein) pseudogene 5 ENSG00000212789 NA
NA 55732 C1orf112 chromosome 1 open reading frame 112 ENSG00000000460 NA
NA ENSG00000182165 TP53TG1 TP53 target 1 (non-protein coding) ENSG00000182165 NA
The protein encoded by this gene is a type I membrane protein that forms one of the two chains of a receptor for interferons alpha and beta. Binding and activation of the receptor stimulates Janus protein kinases, which in turn phosphorylate several proteins, including STAT1 and STAT2. Multiple transcript variants encoding at least two different isoforms have been found for this gene. 3455 IFNAR2 interferon alpha and beta receptor subunit 2 ENSG00000159110 NA
NA 644591 PPIAL4G peptidylprolyl isomerase A like 4G ENSG00000236334 NA
NA 196743 PAOX polyamine oxidase (exo-N4-amino) ENSG00000148832 NA
NA ENSG00000218175 AC016739.2 NA ENSG00000218175 NA
This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Defects in this gene are a cause of short-rib thoracic dysplasia 11 with or without polydactyly. 89891 WDR34 WD repeat domain 34 ENSG00000119333 NA
This gene encodes the enzyme responsible for hydrolysis of both HIBYL-CoA and beta-hydroxypropionyl-CoA. Mutations in this gene have been associated with 3-hyroxyisobutyryl-CoA hydrolase deficiency. Alternative splicing results in multiple transcript variants. 26275 HIBCH 3-hydroxyisobutyryl-CoA hydrolase ENSG00000198130 NA
This gene encodes a member of the EF-hand domain-containing calcium-binding superfamily. The encoded protein interacts with many other proteins, including the platelet integrin alpha-IIb-beta-3, DNA-dependent protein kinase, presenilin-2, focal adhesion kinase, p21 activated kinase, and protein kinase D. The encoded protein may be involved in cell survival and proliferation, and is associated with several disease states including cancer and Alzheimer’s disease. Alternative splicing results in multiple transcript variants. 10519 CIB1 calcium and integrin binding 1 ENSG00000185043 NA
IRF7 encodes interferon regulatory factor 7, a member of the interferon regulatory transcription factor (IRF) family. IRF7 has been shown to play a role in the transcriptional activation of virus-inducible cellular genes, including interferon beta chain genes. Inducible expression of IRF7 is largely restricted to lymphoid tissue. Multiple IRF7 transcript variants have been identified, although the functional consequences of these have not yet been established. 3665 IRF7 interferon regulatory factor 7 ENSG00000185507 NA
NA 25771 TBC1D22A TBC1 domain family member 22A ENSG00000054611 NA
NA 339559 ZFP69 ZFP69 zinc finger protein ENSG00000187815 NA
This gene encodes a nuclear protein involved in homologous recombination, telomere length maintenance, and DNA double-strand break repair. By itself, the protein has 3’ to 5’ exonuclease activity and endonuclease activity. The protein forms a complex with the RAD50 homolog; this complex is required for nonhomologous joining of DNA ends and possesses increased single-stranded DNA endonuclease and 3’ to 5’ exonuclease activities. In conjunction with a DNA ligase, this protein promotes the joining of noncomplementary ends in vitro using short homologies near the ends of the DNA fragments. This gene has a pseudogene on chromosome 3. Alternative splicing of this gene results in two transcript variants encoding different isoforms. 4361 MRE11A MRE11 homolog A, double strand break repair nuclease ENSG00000020922 NA
NA 55734 ZFP64 ZFP64 zinc finger protein ENSG00000020256 NA
NA 63979 FIGNL1 fidgetin like 1 ENSG00000132436 NA
The eukaryotic cell cycle is governed by cyclin-dependent protein kinases (CDKs) whose activities are regulated by cyclins and CDK inhibitors. The protein encoded by this gene is a member of the cyclin family and contains the cyclin box. The encoded protein lacks the protein destabilizing (PEST) sequence that is present in other family members. Transcriptional activation of this gene can be induced by tumor protein p53. Two transcript variants encoding the same protein have been identified for this gene. 900 CCNG1 cyclin G1 ENSG00000113328 NA
Phosphatidylinositol 4-kinases (PI4Ks) phosphorylate phosphatidylinositol to generate phosphatidylinositol 4-phosphate (PIP), an immediate precursor of several important signaling and scaffolding molecules. PIP itself may also have direct functional and structural roles. PI4K2B is a primarily cytosolic PI4K that is recruited to membranes, where it stimulates phosphatidylinositol 4,5-bisphosphate synthesis (Wei et al., 2002 [PubMed 12324459]). 55300 PI4K2B phosphatidylinositol 4-kinase type 2 beta ENSG00000038210 NA
This gene is located in a region close to the locus of the pseudogene of chemokine (C-C motif) receptor-like 1 on chromosome 6. The specific function of this gene has not yet been determined. 25901 CCDC28A coiled-coil domain containing 28A ENSG00000024862 NA
This gene encodes a small, conserved protein of unknown function that is expressed in a variety of tissues. There are pseudogenes for this gene on chromosomes 6, 8, 16, and X. Alternative splicing results in multiple transcript variants. 201725 C4orf46 chromosome 4 open reading frame 46 ENSG00000205208 NA
The protein encoded by this gene belongs to the arginine N-methyltransferase family, which catalyze the sequential transfer of methyl group from S-adenosyl-L-methionine to the side chain nitrogens of arginine residues within proteins, to form methylated arginine derivatives and S-adenosyl-L-homocysteine. This protein can catalyze both, the formation of omega-N monomethylarginine and asymmetrical dimethylarginine, with a strong preference for the latter. It specifically mediates the asymmetric dimethylation of Arg2 of histone H3, and the methylated form represents a specific tag for epigenetic transcriptional repression. This protein also forms a complex with, and methylates DNA polymerase beta, resulting in stimulation of polymerase activity by enhancing DNA binding and processivity. 55170 PRMT6 protein arginine methyltransferase 6 ENSG00000198890 NA
This gene encodes an NADPH sensor protein that preferentially binds to NADPH. The encoded protein also negatively regulates the activity of NF-kappaB in a ubiquitylation-dependent manner. It plays a key role in cellular antiviral response by negatively regulating the interferon response factor 3-mediated expression of interferon beta. Alternative splicing of this gene results in multiple transcript variants. 57407 NMRAL1 NmrA-like family domain containing 1 ENSG00000153406 NA
This gene encodes a member of the cysteine-aspartic acid protease (caspase) family. Sequential activation of caspases plays a central role in the execution-phase of cell apoptosis. Caspases exist as inactive proenzymes which undergo proteolytic processing at conserved aspartic residues to produce two subunits, large and small, that dimerize to form the active enzyme. The precursor of the encoded protein is cleaved by caspase 3 and 10, is activated upon cell death stimuli and induces apoptosis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 840 CASP7 caspase 7 ENSG00000165806 NA
NA 55791 LRIF1 ligand dependent nuclear receptor interacting factor 1 ENSG00000121931 NA
NA ENSG00000269749 AC005614.5 NA ENSG00000269749 NA
NA 84191 FAM96A family with sequence similarity 96 member A ENSG00000166797 NA
NA 3157 HMGCS1 3-hydroxy-3-methylglutaryl-CoA synthase 1 ENSG00000112972 NA
NA ENSG00000269534 CTC-453G23.5 NA ENSG00000269534 NA
NA 79828 METTL8 methyltransferase like 8 ENSG00000123600 NA
NA ENSG00000261438 RP11-399O19.9 NA ENSG00000261438 NA
BORA is an activator of the protein kinase Aurora A (AURKA; MIM 603072), which is required for centrosome maturation, spindle assembly, and asymmetric protein localization during mitosis (Hutterer et al., 2006 [PubMed 16890155]). 79866 BORA bora, aurora kinase A activator ENSG00000136122 NA
Abscission, the separation of daughter cells at the end of cytokinesis, is effected by endosomal sorting complexes required for transport III (ESCRT-III). The protein encoded by this gene functions as a homodimer, with the N-termini binding to a subset of ESCRT-III subunits and the C-termini binding to membranes. The encoded protein regulates ESCRT-III activity and is required for proper cytokinesis. Several transcript variants encoding different isoforms have been found for this gene. 129531 MITD1 microtubule interacting and trafficking domain containing 1 ENSG00000158411 NA
NA 100506100 LOC100506100 uncharacterized LOC100506100 ENSG00000223478 NA
This gene encodes an evolutionarily conserved protein associated with cell apoptosis. The protein interacts with the serine/threonine protein kinase MST4 to modulate the extracellular signal-regulated kinase (ERK) pathway. It also interacts with and is phosphoryated by serine/threonine kinase 25, and is thought to function in a signaling pathway essential for vascular developent. Mutations in this gene are one cause of cerebral cavernous malformations, which are vascular malformations that cause seizures and cerebral hemorrhages. Multiple alternatively spliced variants, encoding the same protein, have been identified. 11235 PDCD10 programmed cell death 10 ENSG00000114209 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
SERPINA1 5265 ENSG00000197249 serpin family A member 1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. NA
RP11-535A19.1 ENSG00000254814 ENSG00000254814 NA NA NA
RP11-426L16.3 ENSG00000225075 ENSG00000225075 NA NA NA
DBF4B 80174 ENSG00000161692 DBF4 zinc finger B This gene encodes a regulator of the cell division cycle 7 homolog (S. cerevisiae) protein, a serine-threonine kinase which links cell cycle regulation to genome duplication. This protein localizes to the nucleus and, in complex with the cell division cycle 7 homolog (S. cerevisiae) protein, may facilitate M phase progression. Alternative splicing results in multiple transcript variants. NA
RPS12P28 ENSG00000240494 ENSG00000240494 ribosomal protein S12 pseudogene 28 NA NA
ARF4-AS1 106144532 ENSG00000272146 ARF4 antisense RNA 1 NA NA
ZNF93 81931 ENSG00000184635 zinc finger protein 93 NA NA
PITPNA-AS1 100306951 ENSG00000236618 PITPNA antisense RNA 1 NA NA
CBX8 57332 ENSG00000141570 chromobox 8 NA NA
DUSP5 1847 ENSG00000138166 dual specificity phosphatase 5 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, is expressed in a variety of tissues with the highest levels in pancreas and brain, and is localized in the nucleus. NA
TMEM99 147184 ENSG00000167920 transmembrane protein 99 NA NA
CEP152 22995 ENSG00000103995 centrosomal protein 152 This gene encodes a protein that is thought to be involved with centrosome function. Mutations in this gene have been associated with primary microcephaly (MCPH4). Alternative splicing results in multiple transcript variants. NA
ZNF90 7643 ENSG00000213988 zinc finger protein 90 NA NA
LOC100507291 100507291 ENSG00000248932 uncharacterized LOC100507291 NA NA
FAM86HP ENSG00000253540 ENSG00000253540 family with sequence similarity 86 member H, pseudogene NA NA
KPTN 11133 ENSG00000118162 kaptin (actin binding protein) This gene encodes a filamentous-actin-associated protein, which is involved in actin dynamics and plays an important role in neuromorphogenesis. Mutations in this gene result in recessive mental retardation-41. Alternatively spliced transcript variants have been found for this gene. NA
FBXO46 23403 ENSG00000177051 F-box protein 46 Members of the F-box protein family, such as FBXO46, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). NA
OXT 5020 ENSG00000101405 oxytocin/neurophysin I prepropeptide This gene encodes a precursor protein that is processed to produce oxytocin and neurophysin I. Oxytocin is a posterior pituitary hormone which is synthesized as an inactive precursor in the hypothalamus along with its carrier protein neurophysin I. Together with neurophysin, it is packaged into neurosecretory vesicles and transported axonally to the nerve endings in the neurohypophysis, where it is either stored or secreted into the bloodstream. The precursor seems to be activated while it is being transported along the axon to the posterior pituitary. This hormone contracts smooth muscle during parturition and lactation. It is also involved in cognition, tolerance, adaptation and complex sexual and maternal behaviour, as well as in the regulation of water excretion and cardiovascular functions. NA
YTHDF3-AS1 101410533 ENSG00000270673 YTHDF3 antisense RNA 1 (head to head) NA NA
RP11-715F3.2 ENSG00000266783 ENSG00000266783 NA NA NA
TNFSF4 7292 ENSG00000117586 tumor necrosis factor superfamily member 4 This gene encodes a cytokine of the tumor necrosis factor (TNF) ligand family. The encoded protein functions in T cell antigen-presenting cell (APC) interactions and mediates adhesion of activated T cells to endothelial cells. Polymorphisms in this gene have been associated with Sjogren’s syndrome and systemic lupus erythematosus. Alternative splicing results in multiple transcript variants. NA
RP11-395A13.2 ENSG00000272667 ENSG00000272667 NA NA NA
TP53I13 90313 ENSG00000167543 tumor protein p53 inducible protein 13 NA NA
CTB-50L17.7 ENSG00000267030 ENSG00000267030 NA NA NA
MGMT 4255 ENSG00000170430 O-6-methylguanine-DNA methyltransferase Alkylating agents are potent carcinogens that can result in cell death, mutation and cancer. The protein encoded by this gene is a DNA repair protein that is involved in cellular defense against mutagenesis and toxicity from alkylating agents. The protein catalyzes transfer of methyl groups from O(6)-alkylguanine and other methylated moieties of the DNA to its own molecule, which repairs the toxic lesions. Methylation of the genes promoter has been associated with several cancer types, including colorectal cancer, lung cancer, lymphoma and glioblastoma. NA
C20orf96 140680 ENSG00000196476 chromosome 20 open reading frame 96 NA NA
RP4-714D9.2 ENSG00000241073 ENSG00000241073 NA NA NA
RCCD1 91433 ENSG00000166965 RCC1 domain containing 1 NA NA
CTA-984G1.5 ENSG00000237015 ENSG00000237015 NA NA NA
RP11-75L1.2 ENSG00000213443 ENSG00000213443 NA NA NA
LINC00665 100506930 ENSG00000232677 long intergenic non-protein coding RNA 665 NA NA
RP11-66B24.4 ENSG00000259583 ENSG00000259583 NA NA NA
DGAT2 84649 ENSG00000062282 diacylglycerol O-acyltransferase 2 This gene encodes one of two enzymes which catalyzes the final reaction in the synthesis of triglycerides in which diacylglycerol is covalently bound to long chain fatty acyl-CoAs. The encoded protein catalyzes this reaction at low concentrations of magnesium chloride while the other enzyme has high activity at high concentrations of magnesium chloride. Multiple transcript variants encoding different isoforms have been found for this gene. NA
BCL2L12 83596 ENSG00000126453 BCL2 like 12 This gene encodes a member of a family of proteins containing a Bcl-2 homology domain 2 (BH2). The encoded protein is an anti-apoptotic factor that acts as an inhibitor of caspases 3 and 7 in the cytoplasm. In the nucleus, it binds to the p53 tumor suppressor protein, preventing its association with target genes. Overexpression of this gene has been detected in a number of different cancers. There is a pseudogene for this gene on chromosome 3. Alternative splicing results in multiple transcript variants. NA
NUDT1 4521 ENSG00000106268 nudix hydrolase 1 Misincorporation of oxidized nucleoside triphosphates into DNA/RNA during replication and transcription can cause mutations that may result in carcinogenesis or neurodegeneration. The protein encoded by this gene is an enzyme that hydrolyzes oxidized purine nucleoside triphosphates, such as 8-oxo-dGTP, 8-oxo-dATP, 2-hydroxy-dATP, and 2-hydroxy rATP, to monophosphates, thereby preventing misincorporation. The encoded protein is localized mainly in the cytoplasm, with some in the mitochondria, suggesting that it is involved in the sanitization of nucleotide pools both for nuclear and mitochondrial genomes. Several alternatively spliced transcript variants, some of which encode distinct isoforms, have been identified. Additional variants have been observed, but their full-length natures have not been determined. A single-nucleotide polymorphism that results in the production of an additional, longer isoform (p26) has been described. NA
MOV10 4343 ENSG00000155363 Mov10 RISC complex RNA helicase NA NA
ZNF85 7639 ENSG00000105750 zinc finger protein 85 NA NA
DNAJC4 3338 ENSG00000110011 DnaJ heat shock protein family (Hsp40) member C4 NA NA
ANKRD53 79998 ENSG00000144031 ankyrin repeat domain 53 NA NA
RP11-505K9.1 ENSG00000260018 ENSG00000260018 NA NA NA
RP11-69H7.3 ENSG00000261779 ENSG00000261779 NA NA NA
PAFAH1B3 5050 ENSG00000079462 platelet activating factor acetylhydrolase 1b catalytic subunit 3 This gene encodes an acetylhydrolase that catalyzes the removal of an acetyl group from the glycerol backbone of platelet-activating factor. The encoded enzyme is a subunit of the platelet-activating factor acetylhydrolase isoform 1B complex, which consists of the catalytic beta and gamma subunits and the regulatory alpha subunit. This complex functions in brain development. A translocation between this gene on chromosome 19 and the CDC-like kinase 2 gene on chromosome 1 has been observed, and was associated with mental retardation, ataxia, and atrophy of the brain. Alternatively spliced transcript variants have been described. NA
HIGD2A 192286 ENSG00000146066 HIG1 hypoxia inducible domain family member 2A NA NA
UNG 7374 ENSG00000076248 uracil DNA glycosylase This gene encodes one of several uracil-DNA glycosylases. One important function of uracil-DNA glycosylases is to prevent mutagenesis by eliminating uracil from DNA molecules by cleaving the N-glycosylic bond and initiating the base-excision repair (BER) pathway. Uracil bases occur from cytosine deamination or misincorporation of dUMP residues. Alternative promoter usage and splicing of this gene leads to two different isoforms: the mitochondrial UNG1 and the nuclear UNG2. The UNG2 term was used as a previous symbol for the CCNO gene (GeneID 10309), which has been confused with this gene, in the literature and some databases. NA
SWI5 375757 ENSG00000175854 SWI5 homologous recombination repair protein NA NA
HEMK1 51409 ENSG00000114735 HemK methyltransferase family member 1 NA NA
CTD-2270L9.4 ENSG00000260136 ENSG00000260136 NA NA NA
WDR27 253769 ENSG00000184465 WD repeat domain 27 This gene encodes a protein with multiple WD repeats. Proteins with these repeats may form scaffolds for protein-protein interaction and play key roles in cell signalling. Alternative splicing results in multiple transcript variants, but the full-length structure of some of these variants cannot be determined. NA
RP11-426C22.5 ENSG00000260517 ENSG00000260517 NA NA NA
FAM200A 221786 ENSG00000221909 family with sequence similarity 200 member A This gene encodes a protein of unknown function. The protein is weakly similar to transposase-like proteins in human and mouse. NA
INTS6-AS1 ENSG00000236778 ENSG00000236778 INTS6 antisense RNA 1 NA NA
ATP23 91419 ENSG00000166896 ATP23 metallopeptidase and ATP synthase assembly factor homolog (S. cerevisiae) The protein encoded by this gene is amplified in glioblastomas and interacts with the DNA binding subunit of DNA-dependent protein kinase. This kinase is involved in double-strand break repair (DSB), and higher expression of the encoded protein increases the efficiency of DSB. In addition, comparison to orthologous proteins strongly suggests that this protein is a metalloprotease important in the biosynthesis of mitochondrial ATPase. Several transcript variants encoding different isoforms have been found for this gene. NA
AC011290.5 ENSG00000236015 ENSG00000236015 NA NA NA
CCDC61 729440 ENSG00000104983 coiled-coil domain containing 61 NA NA
CLDND2 125875 ENSG00000160318 claudin domain containing 2 NA NA
ADM5 199800 ENSG00000224420 adrenomedullin 5 (putative) NA NA
CTD-2369P2.4 ENSG00000267105 ENSG00000267105 NA NA NA
RP11-809O17.1 ENSG00000253210 ENSG00000253210 NA NA NA
C17orf58 284018 ENSG00000186665 chromosome 17 open reading frame 58 NA NA
RPL28 6158 ENSG00000108107 ribosomal protein L28 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L28E family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
RPSAP18 ENSG00000224261 ENSG00000224261 ribosomal protein SA pseudogene 18 NA NA
NRM 11270 ENSG00000137404 nurim (nuclear envelope membrane protein) The protein encoded by this gene contains transmembrane domains and resides within the inner nuclear membrane, where it is tightly associated with the nucleus. This protein shares homology with isoprenylcysteine carboxymethyltransferase enzymes. Alternative splicing results in multiple transcript variants that encode different protein isoforms. NA
FANCG 2189 ENSG00000221829 Fanconi anemia complementation group G The Fanconi anemia complementation group (FANC) currently includes FANCA, FANCB, FANCC, FANCD1 (also called BRCA2), FANCD2, FANCE, FANCF, FANCG, FANCI, FANCJ (also called BRIP1), FANCL, FANCM and FANCN (also called PALB2). The previously defined group FANCH is the same as FANCA. Fanconi anemia is a genetically heterogeneous recessive disorder characterized by cytogenetic instability, hypersensitivity to DNA crosslinking agents, increased chromosomal breakage, and defective DNA repair. The members of the Fanconi anemia complementation group do not share sequence similarity; they are related by their assembly into a common nuclear protein complex. This gene encodes the protein for complementation group G. NA
CTD-2192J16.21 ENSG00000269560 ENSG00000269560 NA NA NA
GEMIN6 79833 ENSG00000152147 gem nuclear organelle associated protein 6 GEMIN6 is part of a large macromolecular complex, localized to both the cytoplasm and the nucleus, that plays a role in the cytoplasmic assembly of small nuclear ribonucleoproteins (snRNPs). Other members of this complex include SMN (MIM 600354), GEMIN2 (SIP1; MIM 602595), GEMIN3 (DDX20; MIM 606168), GEMIN4 (MIM 606969), and GEMIN5 (MIM 607005). NA
TICAM2 353376 ENSG00000243414 toll like receptor adaptor molecule 2 TIRP is a Toll/interleukin-1 receptor (IL1R; MIM 147810) (TIR) domain-containing adaptor protein involved in Toll receptor signaling (see TLR4; MIM 603030). NA
RGS5 8490 ENSG00000232995 regulator of G-protein signaling 5 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. NA
RP4-622L5.7 ENSG00000224066 ENSG00000224066 NA NA NA
CTD-3138B18.5 ENSG00000268516 ENSG00000268516 NA NA NA
ZNF546 339327 ENSG00000187187 zinc finger protein 546 NA NA
RP11-119B16.2 ENSG00000229539 ENSG00000229539 NA NA NA
ST6GALNAC4P1 ENSG00000233469 ENSG00000233469 ST6 (alpha-N-acetyl-neuraminyl-2,3-beta-galactosyl-1,3)-N-acetylgalactosaminide alpha-2,6-sialyltransferase 4 pseudogene 1 NA NA
NA NA ENSG00000273116 NA NA TRUE
LRRC27 80313 ENSG00000148814 leucine rich repeat containing 27 NA NA
C4orf46 201725 ENSG00000205208 chromosome 4 open reading frame 46 This gene encodes a small, conserved protein of unknown function that is expressed in a variety of tissues. There are pseudogenes for this gene on chromosomes 6, 8, 16, and X. Alternative splicing results in multiple transcript variants. NA
ATP6AP1L 92270 ENSG00000205464 ATPase H+ transporting accessory protein 1 like NA NA
LOC101927151 101927151 ENSG00000267575 uncharacterized LOC101927151 NA NA
NA NA ENSG00000224956 NA NA TRUE
HHLA3 11147 ENSG00000197568 HERV-H LTR-associating 3 NA NA
AC009005.2 ENSG00000267751 ENSG00000267751 NA NA NA
ZNF671 79891 ENSG00000083814 zinc finger protein 671 NA NA
COMMD6 170622 ENSG00000188243 COMM domain containing 6 COMMD6 belongs to a family of NF-kappa-B (see RELA; MIM 164014)-inhibiting proteins characterized by the presence of a COMM domain (see COMMD1; MIM 607238) (de Bie et al., 2006 [PubMed 16573520]). NA
ZNF551 90233 ENSG00000204519 zinc finger protein 551 NA NA
C5orf63 401207 ENSG00000164241 chromosome 5 open reading frame 63 NA NA
DDX11-AS1 100506660 ENSG00000245614 DDX11 antisense RNA 1 NA NA
TRIM45 80263 ENSG00000134253 tripartite motif containing 45 This gene encodes a member of the tripartite motif family. The encoded protein may function as a transcriptional repressor of the mitogen-activated protein kinase pathway. Alternatively spliced transcript variants have been described. NA
FBF1 ENSG00000188878 ENSG00000188878 Fas (TNFRSF6) binding factor 1 NA NA
FASTKD1 79675 ENSG00000138399 FAST kinase domains 1 NA NA
MSH5 4439 ENSG00000204410 mutS homolog 5 This gene encodes a member of the mutS family of proteins that are involved in DNA mismatch repair and meiotic recombination. This protein is similar to a Saccharomyces cerevisiae protein that participates in segregation fidelity and crossing-over events during meiosis. This protein plays a role in promoting ionizing radiation-induced apoptosis. This protein forms hetero-oligomers with another member of this family, mutS homolog 4. Polymorphisms in this gene have been linked to various human diseases, including IgA deficiency, common variable immunodeficiency, and premature ovarian failure. Alternative splicing results multiple transcript variants. Read-through transcription also exists between this gene and the downstream chromosome 6 open reading frame 26 (C6orf26) gene. NA
RP3-330M21.5 ENSG00000245261 ENSG00000245261 NA NA NA
ZNF692 55657 ENSG00000171163 zinc finger protein 692 NA NA
TMEM42 131616 ENSG00000169964 transmembrane protein 42 NA NA
RFTN1 23180 ENSG00000131378 raftlin, lipid raft linker 1 NA NA
CEP41 95681 ENSG00000106477 centrosomal protein 41 This gene encodes a centrosomal and microtubule-binding protein which is predicted to have two coiled-coil domains and a rhodanese domain. In human retinal pigment epithelial cells the protein localized to centrioles and cilia. Mutations in this gene have been associated with Joubert Syndrome 15; an autosomal recessive ciliopathy and neurological disorder. Alternative splicing results in multiple transcript variants. NA
TBC1D31 93594 ENSG00000156787 TBC1 domain family member 31 NA NA
NA NA ENSG00000259901 NA NA TRUE
RP11-421L21.3 ENSG00000233184 ENSG00000233184 NA NA NA
CCNE1 898 ENSG00000105173 cyclin E1 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin forms a complex with and functions as a regulatory subunit of CDK2, whose activity is required for cell cycle G1/S transition. This protein accumulates at the G1-S phase boundary and is degraded as cells progress through S phase. Overexpression of this gene has been observed in many tumors, which results in chromosome instability, and thus may contribute to tumorigenesis. This protein was found to associate with, and be involved in, the phosphorylation of NPAT protein (nuclear protein mapped to the ATM locus), which participates in cell-cycle regulated histone gene expression and plays a critical role in promoting cell-cycle progression in the absence of pRB. NA
RP11-521I2.3 ENSG00000260368 ENSG00000260368 NA NA NA
PBX3 5090 ENSG00000167081 PBX homeobox 3 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol notfound
complexin 1 Proteins encoded by the complexin/synaphin gene family are cytosolic proteins that function in synaptic vesicle exocytosis. These proteins bind syntaxin, part of the SNAP receptor. The protein product of this gene binds to the SNAP receptor complex and disrupts it, allowing transmitter release. 10815 ENSG00000168993 CPLX1 NA
secreted phosphoprotein 1 The protein encoded by this gene is involved in the attachment of osteoclasts to the mineralized bone matrix. The encoded protein is secreted and binds hydroxyapatite with high affinity. The osteoclast vitronectin receptor is found in the cell membrane and may be involved in the binding to this protein. This protein is also a cytokine that upregulates expression of interferon-gamma and interleukin-12. Several transcript variants encoding different isoforms have been found for this gene. 6696 ENSG00000118785 SPP1 NA
paralemmin This gene encodes a member of the paralemmin protein family. The product of this gene is a prenylated and palmitoylated phosphoprotein that associates with the cytoplasmic face of plasma membranes and is implicated in plasma membrane dynamics in neurons and other cell types. Several alternatively spliced transcript variants have been identified, but the full-length nature of only two transcript variants has been determined. 5064 ENSG00000099864 PALM NA
CD34 molecule The protein encoded by this gene may play a role in the attachment of stem cells to the bone marrow extracellular matrix or to stromal cells. This single-pass membrane protein is highly glycosylated and phosphorylated by protein kinase C. Two transcript variants encoding different isoforms have been found for this gene. 947 ENSG00000174059 CD34 NA
reticulon 1 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. 6252 ENSG00000139970 RTN1 NA
insulin like growth factor 1 The protein encoded by this gene is similar to insulin in function and structure and is a member of a family of proteins involved in mediating growth and development. The encoded protein is processed from a precursor, bound by a specific receptor, and secreted. Defects in this gene are a cause of insulin-like growth factor I deficiency. Alternative splicing results in multiple transcript variants encoding different isoforms that may undergo similar processing to generate mature protein. 3479 ENSG00000017427 IGF1 NA
islet cell autoantigen 1 This gene encodes a protein with an arfaptin homology domain that is found both in the cytosol and as membrane-bound form on the Golgi complex and immature secretory granules. This protein is believed to be an autoantigen in insulin-dependent diabetes mellitus and primary Sjogren’s syndrome. Several transcript variants encoding two different isoforms have been found for this gene. 3382 ENSG00000003147 ICA1 NA
protein kinase cAMP-dependent type II regulatory subunit beta cAMP is a signaling molecule important for a variety of cellular functions. cAMP exerts its effects by activating the cAMP-dependent protein kinase, which transduces the signal through phosphorylation of different target proteins. The inactive kinase holoenzyme is a tetramer composed of two regulatory and two catalytic subunits. cAMP causes the dissociation of the inactive holoenzyme into a dimer of regulatory subunits bound to four cAMP and two free monomeric catalytic subunits. Four different regulatory subunits and three catalytic subunits have been identified in humans. The protein encoded by this gene is one of the regulatory subunits. This subunit can be phosphorylated by the activated catalytic subunit. This subunit has been shown to interact with and suppress the transcriptional activity of the cAMP responsive element binding protein 1 (CREB1) in activated T cells. Knockout studies in mice suggest that this subunit may play an important role in regulating energy balance and adiposity. The studies also suggest that this subunit may mediate the gene induction and cataleptic behavior induced by haloperidol. 5577 ENSG00000005249 PRKAR2B NA
SPARC related modular calcium binding 1 This gene encodes a multi-domain secreted protein that may have a critical role in ocular and limb development. Mutations in this gene are associated with microphthalmia and limb anomalies. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 64093 ENSG00000198732 SMOC1 NA
receptor activity modifying protein 2 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP2) protein, CRLR functions as an adrenomedullin receptor. The RAMP2 protein is involved in core glycosylation and transportation of adrenomedullin receptor to the cell surface. 10266 ENSG00000131477 RAMP2 NA
NA NA NA ENSG00000271738 NA TRUE
plasmalemma vesicle associated protein NA 83483 ENSG00000130300 PLVAP NA
KN motif and ankyrin repeat domains 3 NA 256949 ENSG00000186994 KANK3 NA
dual specificity phosphatase 8 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which is associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates SAPK/JNK and p38, is expressed predominantly in the adult brain, heart, and skeletal muscle, is localized in the cytoplasm, and is induced by nerve growth factor and insulin. An intronless pseudogene for DUSP8 is present on chromosome 10q11.2. 1850 ENSG00000184545 DUSP8 NA
RAMP2 antisense RNA 1 NA 100190938 ENSG00000197291 RAMP2-AS1 NA
family with sequence similarity 213 member A NA 84293 ENSG00000122378 FAM213A NA
glutamic-pyruvate transaminase (alanine aminotransferase) This gene encodes cytosolic alanine aminotransaminase 1 (ALT1); also known as glutamate-pyruvate transaminase 1. This enzyme catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate and, therefore, plays a key role in the intermediary metabolism of glucose and amino acids. Serum activity levels of this enzyme are routinely used as a biomarker of liver injury caused by drug toxicity, infection, alcohol, and steatosis. A related gene on chromosome 16 encodes a putative mitochondrial alanine aminotransaminase. 2875 ENSG00000167701 GPT NA
KCNIP2 antisense RNA 1 NA ENSG00000226009 ENSG00000226009 KCNIP2-AS1 NA
protein phosphatase 1 regulatory inhibitor subunit 1A NA 5502 ENSG00000135447 PPP1R1A NA
fucosyltransferase 1 (H blood group) The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Mutations in this gene are a cause of the H-Bombay blood group. 2523 ENSG00000174951 FUT1 NA
NA NA ENSG00000260912 ENSG00000260912 RP11-363E7.4 NA
perilipin 1 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. 5346 ENSG00000166819 PLIN1 NA
multimerin 2 This gene encodes a protein belonging to the member of elastin microfibril interface-located (EMILIN) protein family. This family member is an extracellular matrix glycoprotein that can interfere with tumor angiogenesis and growth. It serves as a transforming growth factor beta antagonist and can interfere with the VEGF-A/VEGFR2 pathway. A related pseudogene has been identified on chromosome 6. 79812 ENSG00000173269 MMRN2 NA
notch 4 This gene encodes a member of the NOTCH family of proteins. Members of this Type I transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple different domain types. Notch signaling is an evolutionarily conserved intercellular signaling pathway that regulates interactions between physically adjacent cells through binding of Notch family receptors to their cognate ligands. The encoded preproprotein is proteolytically processed in the trans-Golgi network to generate two polypeptide chains that heterodimerize to form the mature cell-surface receptor. This receptor may play a role in vascular, renal and hepatic development. Mutations in this gene may be associated with schizophrenia. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 4855 ENSG00000204301 NOTCH4 NA
solute carrier family 29 member 4 This gene encodes a member of the SLC29A/ENT transporter protein family. The encoded membrane protein catalyzes the reuptake of monoamines into presynaptic neurons, thus determining the intensity and duration of monoamine neural signaling. It has been shown to transport several compounds, including serotonin, dopamine, and the neurotoxin 1-methyl-4-phenylpyridinium. Alternative splicing results in multiple transcript variants. 222962 ENSG00000164638 SLC29A4 NA
CD200 molecule This gene encodes a type I membrane glycoprotein containing two extracellular immunoglobulin domains, a transmembrane and a cytoplasmic domain. This gene is expressed by various cell types, including B cells, a subset of T cells, thymocytes, endothelial cells, and neurons. The encoded protein plays an important role in immunosuppression and regulation of anti-tumor activity. Alternative splicing results in multiple transcript variants encoding different isoforms. 4345 ENSG00000091972 CD200 NA
FXYD domain containing ion transport regulator 6 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. 53826 ENSG00000137726 FXYD6 NA
NA NA ENSG00000254528 ENSG00000254528 RP11-728F11.4 NA
family with sequence similarity 89 member A NA 375061 ENSG00000182118 FAM89A NA
G protein-coupled receptor kinase 3 The beta-adrenergic receptor kinase specifically phosphorylates the agonist-occupied form of the beta-adrenergic and related G protein-coupled receptors. Overall, the beta adrenergic receptor kinase 2 has 85% amino acid similarity with beta adrenergic receptor kinase 1, with the protein kinase catalytic domain having 95% similarity. These data suggest the existence of a family of receptor kinases which may serve broadly to regulate receptor function. 157 ENSG00000100077 GRK3 NA
RNA, U2 small nuclear 2, pseudogene NA ENSG00000222328 ENSG00000222328 RNU2-2P NA
potassium voltage-gated channel interacting protein 2 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. 30819 ENSG00000120049 KCNIP2 NA
ATPase Na+/K+ transporting subunit alpha 2 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. 477 ENSG00000018625 ATP1A2 NA
jagged 2 The Notch signaling pathway is an intercellular signaling mechanism that is essential for proper embryonic development. Members of the Notch gene family encode transmembrane receptors that are critical for various cell fate decisions. The protein encoded by this gene is one of several ligands that activate Notch and related receptors. Two transcript variants encoding different isoforms have been found for this gene. 3714 ENSG00000184916 JAG2 NA
dual specificity phosphatase 4 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. 1846 ENSG00000120875 DUSP4 NA
G protein-coupled receptor 146 NA 115330 ENSG00000164849 GPR146 NA
calcium release activated channel regulator 2B NA 283229 ENSG00000177685 CRACR2B NA
NA NA ENSG00000257607 ENSG00000257607 RP11-449P15.1 NA
microtubule associated tumor suppressor 1 This gene encodes a protein which contains a C-terminal domain able to interact with the angiotension II (AT2) receptor and a large coiled-coil region allowing dimerization. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. One of the transcript variants has been shown to encode a mitochondrial protein that acts as a tumor suppressor and partcipates in AT2 signaling pathways. Other variants may encode nuclear or transmembrane proteins but it has not been determined whether they also participate in AT2 signaling pathways. 57509 ENSG00000129422 MTUS1 NA
TYRO3 protein tyrosine kinase The gene is part of a 3-member transmembrane receptor kinase receptor family with a processed pseudogene distal on chromosome 15. The encoded protein is activated by the products of the growth arrest-specific gene 6 and protein S genes and is involved in controlling cell survival and proliferation, spermatogenesis, immunoregulation and phagocytosis. The encoded protein has also been identified as a cell entry factor for Ebola and Marburg viruses. 7301 ENSG00000092445 TYRO3 NA
perilipin 4 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). 729359 ENSG00000167676 PLIN4 NA
NA NA ENSG00000258603 ENSG00000258603 RP3-414A15.10 NA
PRKAG2 antisense RNA 1 NA ENSG00000239911 ENSG00000239911 PRKAG2-AS1 NA
calcium/calmodulin dependent protein kinase ID This gene is a member of the calcium/calmodulin-dependent protein kinase 1 family, a subfamily of the serine/threonine kinases. The encoded protein is a component of the calcium-regulated calmodulin-dependent protein kinase cascade. It has been associated with multiple processes including regulation of granulocyte function, activation of CREB-dependent gene transcription, aldosterone synthesis, differentiation and activation of neutrophil cells, and apoptosis of erythroleukemia cells. Alternatively spliced transcript variants encoding different isoforms of this gene have been described. 57118 ENSG00000183049 CAMK1D NA
solute carrier family 16 member 14 NA 151473 ENSG00000163053 SLC16A14 NA
cell adhesion molecule L1 like The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. 10752 ENSG00000134121 CHL1 NA
NA NA ENSG00000267992 ENSG00000267992 CTB-189B5.3 NA
nipsnap homolog 3B NIPSNAP3B belongs to a family of proteins with putative roles in vesicular trafficking (Buechler et al., 2004 [PubMed 15177564]). 55335 ENSG00000165028 NIPSNAP3B NA
retinol binding protein 7 Due to its chemical instability and low solubility in aqueous solution, vitamin A requires cellular retinol-binding proteins (CRBPs), such as RBP7, for stability, internalization, intercellular transfer, homeostasis, and metabolism. 116362 ENSG00000162444 RBP7 NA
adenosine A1 receptor The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene. 134 ENSG00000163485 ADORA1 NA
apolipoprotein E The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. 348 ENSG00000130203 APOE NA
lipase E, hormone sensitive type The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. 3991 ENSG00000079435 LIPE NA
stum, mechanosensory transduction mediator homolog NA 375057 ENSG00000203685 STUM NA
G protein subunit alpha z The protein encoded by this gene is a member of a G protein subfamily that mediates signal transduction in pertussis toxin-insensitive systms. This encoded protein may play a role in maintaining the ionic balance of perilymphatic and endolymphatic cochlear fluids. 2781 ENSG00000128266 GNAZ NA
F-box protein 27 Members of the F-box protein family, such as FBXO27, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 126433 ENSG00000161243 FBXO27 NA
NA NA ENSG00000229299 ENSG00000229299 RP4-583P15.10 NA
SH2 domain containing 3C This gene encodes an adaptor protein and member of a cytoplasmic protein family involved in cell migration. The encoded protein contains a putative Src homology 2 (SH2) domain and guanine nucleotide exchange factor-like domain which allows this signaling protein to form a complex with scaffolding protein Crk-associated substrate. Multiple transcript variants encoding different isoforms have been found for this gene. 10044 ENSG00000095370 SH2D3C NA
integrin subunit alpha 7 The protein encoded by this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. They mediate a wide spectrum of cell-cell and cell-matrix interactions, and thus play a role in cell migration, morphologic development, differentiation, and metastasis. This protein functions as a receptor for the basement membrane protein laminin-1. It is mainly expressed in skeletal and cardiac muscles and may be involved in differentiation and migration processes during myogenesis. Defects in this gene are associated with congenital myopathy. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 3679 ENSG00000135424 ITGA7 NA
synaptosome associated protein 25 Synaptic vesicle membrane docking and fusion is mediated by SNAREs (soluble N-ethylmaleimide-sensitive factor attachment protein receptors) located on the vesicle membrane (v-SNAREs) and the target membrane (t-SNAREs). The assembled v-SNARE/t-SNARE complex consists of a bundle of four helices, one of which is supplied by v-SNARE and the other three by t-SNARE. For t-SNAREs on the plasma membrane, the protein syntaxin supplies one helix and the protein encoded by this gene contributes the other two. Therefore, this gene product is a presynaptic plasma membrane protein involved in the regulation of neurotransmitter release. Two alternative transcript variants encoding different protein isoforms have been described for this gene. 6616 ENSG00000132639 SNAP25 NA
family with sequence similarity 69 member B This gene encodes a member of the FAM69 family of cysteine-rich type II transmembrane proteins. These proteins localize to the endoplasmic reticulum but their specific functions are unknown. 138311 ENSG00000165716 FAM69B NA
ribosomal protein S20 pseudogene 22 NA ENSG00000239218 ENSG00000239218 RPS20P22 NA
intercellular adhesion molecule 2 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein may play a role in lymphocyte recirculation by blocking LFA-1-dependent cell adhesion. It mediates adhesive interactions important for antigen-specific immune response, NK-cell mediated clearance, lymphocyte recirculation, and other cellular interactions important for immune response and surveillance. Several transcript variants encoding the same protein have been found for this gene. 3384 ENSG00000108622 ICAM2 NA
high mobility group nucleosomal binding domain 2 pseudogene 15 NA ENSG00000214578 ENSG00000214578 HMGN2P15 NA
NA NA ENSG00000205959 ENSG00000205959 RP11-689P11.2 NA
SGK2, serine/threonine kinase 2 This gene encodes a serine/threonine protein kinase. Although this gene product is similar to serum- and glucocorticoid-induced protein kinase (SGK), this gene is not induced by serum or glucocorticoids. This gene is induced in response to signals that activate phosphatidylinositol 3-kinase, which is also true for SGK. Alternative splicing results in multiple transcript variants. 10110 ENSG00000101049 SGK2 NA
alkaline ceramidase 2 The sphingolipid metabolite sphingosine-1-phosphate promotes cell proliferation and survival, whereas its precursor, sphingosine, has the opposite effect. The ceramidase ACER2 hydrolyzes very long chain ceramides to generate sphingosine (Xu et al., 2006 [PubMed 16940153]). 340485 ENSG00000177076 ACER2 NA
phospholipase A2 group XVI NA 11145 ENSG00000176485 PLA2G16 NA
NA NA NA ENSG00000256604 NA TRUE
NA NA ENSG00000272678 ENSG00000272678 RP11-797D24.4 NA
NA NA ENSG00000257622 ENSG00000257622 RP11-44N21.4 NA
fatty acid binding protein 4 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. 2167 ENSG00000170323 FABP4 NA
GTPase, IMAP family member 5 This gene encodes a protein belonging to the GTP-binding superfamily and to the immuno-associated nucleotide (IAN) subfamily of nucleotide-binding proteins. In humans, the IAN subfamily genes are located in a cluster at 7q36.1. This gene encodes an antiapoptotic protein that functions in T-cell survival. Polymorphisms in this gene are associated with systemic lupus erythematosus. Read-through transcription exists between this gene and the neighboring upstream GIMAP1 (GTPase, IMAP family member 1) gene. 55340 ENSG00000196329 GIMAP5 NA
carboxypeptidase X (M14 family), member 1 This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. 56265 ENSG00000088882 CPXM1 NA
A2ML1 antisense RNA 1 NA ENSG00000256661 ENSG00000256661 A2ML1-AS1 NA
acetyl-CoA carboxylase beta Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. 32 ENSG00000076555 ACACB NA
hes related family bHLH transcription factor with YRPW motif 1 This gene encodes a nuclear protein belonging to the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcriptional repressors. Expression of this gene is induced by the Notch and c-Jun signal transduction pathways. Two similar and redundant genes in mouse are required for embryonic cardiovascular development, and are also implicated in neurogenesis and somitogenesis. Alternative splicing results in multiple transcript variants. 23462 ENSG00000164683 HEY1 NA
NA NA ENSG00000225792 ENSG00000225792 AC004540.4 NA
NEDD4 binding protein 2-like 1 NA 90634 ENSG00000139597 N4BP2L1 NA
NA NA NA ENSG00000268358 NA TRUE
NA NA ENSG00000256633 ENSG00000256633 RP11-169D4.2 NA
synuclein alpha interacting protein This gene encodes a protein containing several protein-protein interaction domains, including ankyrin-like repeats, a coiled-coil domain, and an ATP/GTP-binding motif. The encoded protein interacts with alpha-synuclein in neuronal tissue and may play a role in the formation of cytoplasmic inclusions and neurodegeneration. A mutation in this gene has been associated with Parkinson’s disease. Alternative splicing results in multiple transcript variants. 9627 ENSG00000064692 SNCAIP NA
uncharacterized LOC102724229 NA 102724229 ENSG00000105808 LOC102724229 NA
RAS p21 protein activator 4 This gene encodes a member of the GAP1 family of GTPase-activating proteins that suppresses the Ras/mitogen-activated protein kinase pathway in response to Ca(2+). Stimuli that increase intracellular Ca(2+) levels result in the translocation of this protein to the plasma membrane, where it activates Ras GTPase activity. Consequently, Ras is converted from the active GTP-bound state to the inactive GDP-bound state and no longer activates downstream pathways that regulate gene expression, cell growth, and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. 10156 ENSG00000105808 RASA4 NA
parathyroid hormone 1 receptor The protein encoded by this gene is a member of the G-protein coupled receptor family 2. This protein is a receptor for parathyroid hormone (PTH) and for parathyroid hormone-like hormone (PTHLH). The activity of this receptor is mediated by G proteins which activate adenylyl cyclase and also a phosphatidylinositol-calcium second messenger system. Defects in this receptor are known to be the cause of Jansen’s metaphyseal chondrodysplasia (JMC), chondrodysplasia Blomstrand type (BOCD), as well as enchodromatosis. Two transcript variants encoding the same protein have been found for this gene. 5745 ENSG00000160801 PTH1R NA
ATP binding cassette subfamily A member 3 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. The full transporter encoded by this gene may be involved in development of resistance to xenobiotics and engulfment during programmed cell death. 21 ENSG00000167972 ABCA3 NA
naked cuticle homolog 1 In the mouse, Nkd is a Dishevelled (see DVL1; MIM 601365)-binding protein that functions as a negative regulator of the Wnt (see WNT1; MIM 164820)-beta-catenin (see MIM 116806)-Tcf (see MIM 602272) signaling pathway. 85407 ENSG00000140807 NKD1 NA
long intergenic non-protein coding RNA 987 NA 100499405 ENSG00000237248 LINC00987 NA
neuromedin B This gene encodes a member of the bombesin-like family of neuropeptides, which negatively regulate eating behavior. The encoded protein may regulate colonic smooth muscle contraction through binding to its cognate receptor, the neuromedin B receptor (NMBR). Polymorphisms of this gene may be associated with hunger, weight gain and obesity. Alternative splicing results in multiple transcript variants. 4828 ENSG00000197696 NMB NA
nuclear receptor subfamily 1 group D member 1 This gene encodes a transcription factor that is a member of the nuclear receptor subfamily 1. The encoded protein is a ligand-sensitive transcription factor that negatively regulates the expression of core clock proteins. In particular this protein represses the circadian clock transcription factor aryl hydrocarbon receptor nuclear translocator-like protein 1 (ARNTL). This protein may also be involved in regulating genes that function in metabolic, inflammatory and cardiovascular processes. 9572 ENSG00000126368 NR1D1 NA
G protein-coupled receptor class C group 5 member C The protein encoded by this gene is a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The specific function of this protein is unknown; however, this protein may mediate the cellular effects of retinoic acid on the G protein signal transduction cascade. Two transcript variants encoding different isoforms have been found for this gene. 55890 ENSG00000170412 GPRC5C NA
adipogenesis regulatory factor APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. 10974 ENSG00000148671 ADIRF NA
perilipin 5 Members of the perilipin family, such as PLIN5, coat intracellular lipid storage droplets and protect them from lipolytic degradation (Dalen et al., 2007 [PubMed 17234449]). 440503 ENSG00000214456 PLIN5 NA
phosphoinositide-3-kinase regulatory subunit 3 NA 8503 ENSG00000117461 PIK3R3 NA
phosphodiesterase 1B The protein encoded by this gene belongs to the cyclic nucleotide phosphodiesterase (PDE) family, and PDE1 subfamily. Members of the PDE1 family are calmodulin-dependent PDEs that are stimulated by a calcium-calmodulin complex. This PDE has dual-specificity for the second messengers, cAMP and cGMP, with a preference for cGMP as a substrate. cAMP and cGMP function as key regulators of many important physiological processes. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 5153 ENSG00000123360 PDE1B NA
dpy-19 like 2 The protein encoded by this gene belongs to the dpy-19 family. It is highly expressed in testis, and is required for sperm head elongation and acrosome formation during spermatogenesis. Mutations in this gene are associated with an infertility disorder, spermatogenic failure type 9 (SPGF9). 283417 ENSG00000177990 DPY19L2 NA
semaphorin 6A The transmembrane semaphorin SEMA6A is expressed in developing neural tissue and is required for proper development of the thalamocortical projection (Leighton et al., 2001 [PubMed 11242070]). 57556 ENSG00000092421 SEMA6A NA
PDZ domain containing 2 The protein encoded by this gene contains six PDZ domains and shares sequence similarity with pro-interleukin-16 (pro-IL-16). Like pro-IL-16, the encoded protein localizes to the endoplasmic reticulum and is thought to be cleaved by a caspase to produce a secreted peptide containing two PDZ domains. In addition, this gene is upregulated in primary prostate tumors and may be involved in the early stages of prostate tumorigenesis. 23037 ENSG00000133401 PDZD2 NA
ankyrin 2, neuronal This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. 287 ENSG00000145362 ANK2 NA
NA NA NA ENSG00000156750 NA TRUE
regulator of calcineurin 2 This gene encodes a member of the regulator of calcineurin (RCAN) protein family. These proteins play a role in many physiological processes by binding to the catalytic domain of calcineurin A, inhibiting calcineurin-mediated nuclear translocation of the transcription factor NFATC1. Expression of this gene in skin fibroblasts is upregulated by thyroid hormone, and the encoded protein may also play a role in endothelial cell function and angiogenesis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 10231 ENSG00000172348 RCAN2 NA
spectrin beta, non-erythrocytic 4 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein localizes to the nuclear matrix, PML nuclear bodies, and cytoplasmic vesicles. A highly similar gene in the mouse is required for localization of specific membrane proteins in polarized regions of neurons. Multiple transcript variants encoding different isoforms have been found for this gene. 57731 ENSG00000160460 SPTBN4 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
CLPS ENSG00000137392 The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. colipase 1208 NA
PDIA2 ENSG00000185615 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). protein disulfide isomerase family A member 2 64714 NA
REG1B ENSG00000172023 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 beta 5968 NA
CTRB2 ENSG00000168928 NA chymotrypsinogen B2 440387 NA
PNLIPRP1 ENSG00000187021 NA pancreatic lipase related protein 1 5407 NA
CELA3B ENSG00000219073 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. chymotrypsin like elastase family member 3B 23436 NA
PNLIP ENSG00000175535 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. pancreatic lipase 5406 NA
NA ENSG00000250606 NA NA NA TRUE
CELA3A ENSG00000142789 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. chymotrypsin like elastase family member 3A 10136 NA
SYCN ENSG00000179751 NA syncollin 342898 NA
CTRB1 ENSG00000168925 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. chymotrypsinogen B1 1504 NA
REG3A ENSG00000172016 This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. regenerating family member 3 alpha 5068 NA
RAP1GAP ENSG00000076864 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. RAP1 GTPase activating protein 5909 NA
MYH2 ENSG00000125414 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. myosin, heavy chain 2, skeletal muscle, adult 4620 NA
CPA1 ENSG00000091704 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. carboxypeptidase A1 1357 NA
PRSS1 ENSG00000204983 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. protease, serine 1 5644 NA
RAB23 ENSG00000112210 This gene encodes a small GTPase of the Ras superfamily. Rab proteins are involved in the regulation of diverse cellular functions associated with intracellular membrane trafficking, including autophagy and immune response to bacterial infection. The encoded protein may play a role in central nervous system development by antagonizing sonic hedgehog signaling. Disruption of this gene has been implicated in Carpenter syndrome as well as cancer. Alternative splicing results in multiple transcript variants. RAB23, member RAS oncogene family 51715 NA
KCNQ4 ENSG00000117013 The protein encoded by this gene forms a potassium channel that is thought to play a critical role in the regulation of neuronal excitability, particularly in sensory cells of the cochlea. The current generated by this channel is inhibited by M1 muscarinic acetylcholine receptors and activated by retigabine, a novel anti-convulsant drug. The encoded protein can form a homomultimeric potassium channel or possibly a heteromultimeric channel in association with the protein encoded by the KCNQ3 gene. Defects in this gene are a cause of nonsyndromic sensorineural deafness type 2 (DFNA2), an autosomal dominant form of progressive hearing loss. Two transcript variants encoding different isoforms have been found for this gene. potassium voltage-gated channel subfamily Q member 4 9132 NA
COL4A2 ENSG00000134871 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. collagen type IV alpha 2 1284 NA
CELA2A ENSG00000142615 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. chymotrypsin like elastase family member 2A 63036 NA
SYNPO2 ENSG00000172403 NA synaptopodin 2 171024 NA
MLK7-AS1 ENSG00000238133 NA MLK7 antisense RNA 1 339751 NA
GP2 ENSG00000169347 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. glycoprotein 2 2813 NA
GEM ENSG00000164949 The protein encoded by this gene belongs to the RAD/GEM family of GTP-binding proteins. It is associated with the inner face of the plasma membrane and could play a role as a regulatory protein in receptor-mediated signal transduction. Alternative splicing occurs at this locus and two transcript variants encoding the same protein have been identified. GTP binding protein overexpressed in skeletal muscle 2669 NA
CYP2S1 ENSG00000167600 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. cytochrome P450 family 2 subfamily S member 1 29785 NA
IL1R2 ENSG00000115590 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. Interleukin 4 (IL4) is reported to antagonize the activity of interleukin 1 by inducing the expression and release of this cytokine. This gene and three other genes form a cytokine receptor gene cluster on chromosome 2q12. Alternative splicing results in multiple transcript variants and protein isoforms. Alternative splicing produces both membrane-bound and soluble proteins. A soluble protein is also produced by proteolytic cleavage. interleukin 1 receptor type 2 7850 NA
EGLN3 ENSG00000129521 NA egl-9 family hypoxia inducible factor 3 112399 NA
SDC4 ENSG00000124145 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan that functions as a receptor in intracellular signaling. The encoded protein is found as a homodimer and is a member of the syndecan proteoglycan family. This gene is found on chromosome 20, while a pseudogene has been found on chromosome 22. syndecan 4 6385 NA
LOC105370792 ENSG00000174171 NA uncharacterized LOC105370792 105370792 NA
PLA2G1B ENSG00000170890 This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. phospholipase A2 group IB 5319 NA
CACNB1 ENSG00000067191 The protein encoded by this gene belongs to the calcium channel beta subunit family. It plays an important role in the calcium channel by modulating G protein inhibition, increasing peak calcium current, controlling the alpha-1 subunit membrane targeting and shifting the voltage dependence of activation and inactivation. Alternative splicing occurs at this locus and three transcript variants encoding three distinct isoforms have been identified. calcium voltage-gated channel auxiliary subunit beta 1 782 NA
COX7A1 ENSG00000161281 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. cytochrome c oxidase subunit 7A1 1346 NA
RGMB ENSG00000174136 RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). repulsive guidance molecule family member b 285704 NA
CD52 ENSG00000169442 NA CD52 molecule 1043 NA
CORO1C ENSG00000110880 This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Three transcript variants encoding two different isoforms have been found for this gene. coronin 1C 23603 NA
COLCA2 ENSG00000214290 NA colorectal cancer associated 2 120376 NA
DNAJB5 ENSG00000137094 DNAJB5 belongs to the evolutionarily conserved DNAJ/HSP40 protein family. For background information on the DNAJ family, see MIM 608375. DnaJ heat shock protein family (Hsp40) member B5 25822 NA
HAPLN3 ENSG00000140511 This gene belongs to the hyaluronan and proteoglycan binding link protein gene family. The protein encoded by this gene may function in hyaluronic acid binding and cell adhesion. hyaluronan and proteoglycan link protein 3 145864 NA
COLEC12 ENSG00000158270 This gene encodes a member of the C-lectin family, proteins that possess collagen-like sequences and carbohydrate recognition domains. This protein is a scavenger receptor, a cell surface glycoprotein that displays several functions associated with host defense. It can bind to carbohydrate antigens on microorganisms, facilitating their recognition and removal. It also mediates the recognition, internalization, and degradation of oxidatively modified low density lipoprotein by vascular endothelial cells. collectin subfamily member 12 81035 NA
CTRL ENSG00000141086 NA chymotrypsin like 1506 NA
COL6A1 ENSG00000142156 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. collagen type VI alpha 1 1291 NA
PRKG1 ENSG00000185532 Mammals have three different isoforms of cyclic GMP-dependent protein kinase (Ialpha, Ibeta, and II). These PRKG isoforms act as key mediators of the nitric oxide/cGMP signaling pathway and are important components of many signal transduction processes in diverse cell types. This PRKG1 gene on human chromosome 10 encodes the soluble Ialpha and Ibeta isoforms of PRKG by alternative transcript splicing. A separate gene on human chromosome 4, PRKG2, encodes the membrane-bound PRKG isoform II. The PRKG1 proteins play a central role in regulating cardiovascular and neuronal functions in addition to relaxing smooth muscle tone, preventing platelet aggregation, and modulating cell growth. This gene is most strongly expressed in all types of smooth muscle, platelets, cerebellar Purkinje cells, hippocampal neurons, and the lateral amygdala. Isoforms Ialpha and Ibeta have identical cGMP-binding and catalytic domains but differ in their leucine/isoleucine zipper and autoinhibitory sequences and therefore differ in their dimerization substrates and kinase enzyme activity. protein kinase, cGMP-dependent, type I 5592 NA
DAPP1 ENSG00000070190 NA dual adaptor of phosphotyrosine and 3-phosphoinositides 1 27071 NA
PLEKHO1 ENSG00000023902 NA pleckstrin homology domain containing O1 51177 NA
LOC101928445 ENSG00000244945 NA uncharacterized LOC101928445 101928445 NA
SVIL-AS1 ENSG00000224597 NA SVIL antisense RNA 1 102724316 NA
CTC-467M3.1 ENSG00000245864 NA NA ENSG00000245864 NA
CGA ENSG00000135346 The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. glycoprotein hormones, alpha polypeptide 1081 NA
FBXO30 ENSG00000118496 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and it is upregulated in nasopharyngeal carcinoma. F-box protein 30 84085 NA
CPA2 ENSG00000158516 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. carboxypeptidase A2 1358 NA
PPP1CB ENSG00000213639 The protein encoded by this gene is one of the three catalytic subunits of protein phosphatase 1 (PP1). PP1 is a serine/threonine specific protein phosphatase known to be involved in the regulation of a variety of cellular processes, such as cell division, glycogen metabolism, muscle contractility, protein synthesis, and HIV-1 viral transcription. Mouse studies suggest that PP1 functions as a suppressor of learning and memory. Two alternatively spliced transcript variants encoding distinct isoforms have been observed. protein phosphatase 1 catalytic subunit beta 5500 NA
ZEB1 ENSG00000148516 This gene encodes a zinc finger transcription factor. The encoded protein likely plays a role in transcriptional repression of interleukin 2. Mutations in this gene have been associated with posterior polymorphous corneal dystrophy-3 and late-onset Fuchs endothelial corneal dystrophy. Alternatively spliced transcript variants encoding different isoforms have been described. zinc finger E-box binding homeobox 1 6935 NA
CPB1 ENSG00000153002 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. carboxypeptidase B1 1360 NA
BAG2 ENSG00000112208 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The predicted BAG2 protein contains 211 amino acids. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner. BCL2 associated athanogene 2 9532 NA
APOL1 ENSG00000100342 This gene encodes a secreted high density lipoprotein which binds to apolipoprotein A-I. Apolipoprotein A-I is a relatively abundant plasma protein and is the major apoprotein of HDL. It is involved in the formation of most cholesteryl esters in plasma and also promotes efflux of cholesterol from cells. This apolipoprotein L family member may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Several different transcript variants encoding different isoforms have been found for this gene. apolipoprotein L1 8542 NA
MAST2 ENSG00000086015 NA microtubule associated serine/threonine kinase 2 23139 NA
A2M ENSG00000175899 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. alpha-2-macroglobulin 2 NA
NA ENSG00000152268 NA NA NA TRUE
RP11-64B16.2 ENSG00000213144 NA NA ENSG00000213144 NA
LRRFIP1 ENSG00000124831 NA leucine rich repeat (in FLII) interacting protein 1 9208 NA
AKAP1 ENSG00000121057 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to type I and type II regulatory subunits of PKA and anchors them to the mitochondrion. This protein is speculated to be involved in the cAMP-dependent signal transduction pathway and in directing RNA to a specific cellular compartment. A-kinase anchoring protein 1 8165 NA
HSPB7 ENSG00000173641 NA heat shock protein family B (small) member 7 27129 NA
MST1R ENSG00000164078 This gene encodes a cell surface receptor for macrophage-stimulating protein (MSP) with tyrosine kinase activity. The mature form of this protein is a heterodimer of disulfide-linked alpha and beta subunits, generated by proteolytic cleavage of a single-chain precursor. The beta subunit undergoes tyrosine phosphorylation upon stimulation by MSP. This protein is expressed on the ciliated epithelia of the mucociliary transport apparatus of the lung, and together with MSP, thought to be involved in host defense. Alternative splicing generates multiple transcript variants encoding different isoforms that may undergo similar proteolytic processing. macrophage stimulating 1 receptor 4486 NA
COL4A1 ENSG00000187498 This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. collagen type IV alpha 1 chain 1282 NA
PDLIM1 ENSG00000107438 This gene encodes a member of the enigma protein family. The protein contains two protein interacting domains, a PDZ domain at the amino terminal end and one to three LIM domains at the carboxyl terminal. It is a cytoplasmic protein associated with the cytoskeleton. The protein may function as an adapter to bring other LIM-interacting proteins to the cytoskeleton. Pseudogenes associated with this gene are located on chromosomes 3, 14 and 17. PDZ and LIM domain 1 9124 NA
PARD3 ENSG00000148498 This gene encodes a member of the PARD protein family. PARD family members interact with other PARD family members and other proteins; they affect asymmetrical cell division and direct polarized cell growth. Multiple alternatively spliced transcript variants have been described for this gene. par-3 family cell polarity regulator 56288 NA
PPP1R3C ENSG00000119938 This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. protein phosphatase 1 regulatory subunit 3C 5507 NA
KIT ENSG00000157404 This gene encodes the human homolog of the proto-oncogene c-kit. C-kit was first identified as the cellular homolog of the feline sarcoma viral oncogene v-kit. This protein is a type 3 transmembrane receptor for MGF (mast cell growth factor, also known as stem cell factor). Mutations in this gene are associated with gastrointestinal stromal tumors, mast cell disease, acute myelogenous lukemia, and piebaldism. Multiple transcript variants encoding different isoforms have been found for this gene. KIT proto-oncogene receptor tyrosine kinase 3815 NA
FLVCR2 ENSG00000119686 This gene encodes a member of the major facilitator superfamily. The encoded transmembrane protein is a calcium transporter. Unlike the related protein feline leukemia virus subgroup C receptor 1, the protein encoded by this locus does not bind to feline leukemia virus subgroup C envelope protein. The encoded protein may play a role in development of brain vascular endothelial cells, as mutations at this locus have been associated with proliferative vasculopathy and hydranencephaly-hydrocephaly syndrome. Alternatively spliced transcript variants have been described. feline leukemia virus subgroup C cellular receptor family member 2 55640 NA
THY1 ENSG00000154096 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. Thy-1 cell surface antigen 7070 NA
SMC5-AS1 ENSG00000268364 NA SMC5 antisense RNA 1 (head to head) ENSG00000268364 NA
CBS ENSG00000160200 The protein encoded by this gene acts as a homotetramer to catalyze the conversion of homocysteine to cystathionine, the first step in the transsulfuration pathway. The encoded protein is allosterically activated by adenosyl-methionine and uses pyridoxal phosphate as a cofactor. Defects in this gene can cause cystathionine beta-synthase deficiency (CBSD), which can lead to homocystinuria. This gene is a major contributor to cellular hydrogen sulfide production. Multiple alternatively spliced transcript variants have been found for this gene. cystathionine-beta-synthase 875 NA
VWF ENSG00000110799 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. von Willebrand factor 7450 NA
MYH1 ENSG00000109061 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. myosin, heavy chain 1, skeletal muscle, adult 4619 NA
ZHX3 ENSG00000174306 This gene encodes a member of the zinc fingers and homeoboxes (ZHX) gene family. The encoded protein contains two C2H2-type zinc fingers and five homeodomains and forms a dimer with itself or with zinc fingers and homeoboxes family member 1. In the nucleus, the dimerized protein interacts with the A subunit of the ubiquitous transcription factor nuclear factor-Y and may function as a transcriptional repressor. zinc fingers and homeoboxes 3 23051 NA
TUBA4A ENSG00000127824 Microtubules of the eukaryotic cytoskeleton perform essential and diverse functions and are composed of a heterodimer of alpha and beta tubulin. The genes encoding these microtubule constituents are part of the tubulin superfamily, which is composed of six distinct families. Genes from the alpha, beta and gamma tubulin families are found in all eukaryotes. The alpha and beta tubulins represent the major components of microtubules, while gamma tubulin plays a critical role in the nucleation of microtubule assembly. There are multiple alpha and beta tubulin genes and they are highly conserved among and between species. This gene encodes an alpha tubulin that is a highly conserved homolog of a rat testis-specific alpha tubulin. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tubulin alpha 4a 7277 NA
TPM1 ENSG00000140416 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) 7168 NA
CDC42EP5 ENSG00000167617 Cell division control protein 42 (CDC42), a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg (binder of Rho GTPases) family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to CDC42 and regulate its function negatively. The encoded protein may inhibit c-Jun N-terminal kinase (JNK) independently of CDC42 binding. The protein may also play a role in septin organization and inducing pseudopodia formation in fibroblasts CDC42 effector protein 5 148170 NA
SLC38A1 ENSG00000111371 Amino acid transporters play essential roles in the uptake of nutrients, production of energy, chemical metabolism, detoxification, and neurotransmitter cycling. SLC38A1 is an important transporter of glutamine, an intermediate in the detoxification of ammonia and the production of urea. Glutamine serves as a precursor for the synaptic transmitter, glutamate (Gu et al., 2001 [PubMed 11325958]). solute carrier family 38 member 1 81539 NA
RP11-730A19.9 ENSG00000234175 NA NA ENSG00000234175 NA
CLDN5 ENSG00000184113 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets. Mutations in this gene have been found in patients with velocardiofacial syndrome. Alternatively spliced transcript variants encoding the same protein have been found for this gene. claudin 5 7122 NA
AC002398.12 ENSG00000267328 NA NA ENSG00000267328 NA
SFTPC ENSG00000168484 This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. surfactant protein C 6440 NA
NCS1 ENSG00000107130 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. neuronal calcium sensor 1 23413 NA
RP11-2H8.2 ENSG00000257410 NA NA ENSG00000257410 NA
DBNDD2 ENSG00000244274 NA dysbindin domain containing 2 55861 NA
CALM1 ENSG00000198668 This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. calmodulin 1 (phosphorylase kinase, delta) 801 NA
CALM2 ENSG00000198668 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. calmodulin 2 (phosphorylase kinase, delta) 805 NA
RPS6KA5 ENSG00000100784 NA ribosomal protein S6 kinase A5 9252 NA
RP5-906A24.2 ENSG00000266101 NA NA ENSG00000266101 NA
NA ENSG00000213165 NA NA NA TRUE
IL17RC ENSG00000163702 This gene encodes a single-pass type I membrane protein that shares similarity with the interleukin-17 receptor (IL-17RA). Unlike IL-17RA, which is predominantly expressed in hemopoietic cells, and binds with high affinity to only IL-17A, this protein is expressed in nonhemopoietic tissues, and binds both IL-17A and IL-17F with similar affinities. The proinflammatory cytokines, IL-17A and IL-17F, have been implicated in the progression of inflammatory and autoimmune diseases. Multiple alternatively spliced transcript variants encoding different isoforms have been detected for this gene, and it has been proposed that soluble, secreted proteins lacking transmembrane and intracellular domains may function as extracellular antagonists to cytokine signaling. interleukin 17 receptor C 84818 NA
ABL1 ENSG00000097007 This gene is a protooncogene that encodes a protein tyrosine kinase involved in a variety of cellular processes, including cell division, adhesion, differentiation, and response to stress. The activity of the protein is negatively regulated by its SH3 domain, whereby deletion of the region encoding this domain results in an oncogene. The ubiquitously expressed protein has DNA-binding activity that is regulated by CDC2-mediated phosphorylation, suggesting a cell cycle function. This gene has been found fused to a variety of translocation partner genes in various leukemias, most notably the t(9;22) translocation that results in a fusion with the 5’ end of the breakpoint cluster region gene (BCR; MIM:151410). Alternative splicing of this gene results in two transcript variants, which contain alternative first exons that are spliced to the remaining common exons. ABL proto-oncogene 1, non-receptor tyrosine kinase 25 NA
BEGAIN ENSG00000183092 NA brain enriched guanylate kinase associated 57596 NA
TPD52L1 ENSG00000111907 This gene encodes a member of a family of proteins that contain coiled-coil domains and may form hetero- or homomers. The encoded protein is involved in cell proliferation and calcium signaling. It also interacts with the mitogen-activated protein kinase kinase kinase 5 (MAP3K5/ASK1) and positively regulates MAP3K5-induced apoptosis. Multiple alternatively spliced transcript variants have been observed. tumor protein D52-like 1 7164 NA
SPP1 ENSG00000118785 The protein encoded by this gene is involved in the attachment of osteoclasts to the mineralized bone matrix. The encoded protein is secreted and binds hydroxyapatite with high affinity. The osteoclast vitronectin receptor is found in the cell membrane and may be involved in the binding to this protein. This protein is also a cytokine that upregulates expression of interferon-gamma and interleukin-12. Several transcript variants encoding different isoforms have been found for this gene. secreted phosphoprotein 1 6696 NA
ABAT ENSG00000183044 4-aminobutyrate aminotransferase (ABAT) is responsible for catabolism of gamma-aminobutyric acid (GABA), an important, mostly inhibitory neurotransmitter in the central nervous system, into succinic semialdehyde. The active enzyme is a homodimer of 50-kD subunits complexed to pyridoxal-5-phosphate. The protein sequence is over 95% similar to the pig protein. GABA is estimated to be present in nearly one-third of human synapses. ABAT in liver and brain is controlled by 2 codominant alleles with a frequency in a Caucasian population of 0.56 and 0.44. The ABAT deficiency phenotype includes psychomotor retardation, hypotonia, hyperreflexia, lethargy, refractory seizures, and EEG abnormalities. Multiple alternatively spliced transcript variants encoding the same protein isoform have been found for this gene. 4-aminobutyrate aminotransferase 18 NA
RASA4B ENSG00000170667 NA RAS p21 protein activator 4B 100271927 NA
CHMP1B ENSG00000255112 CHMP1B belongs to the chromatin-modifying protein/charged multivesicular body protein (CHMP) family. These proteins are components of ESCRT-III (endosomal sorting complex required for transport III), a complex involved in degradation of surface receptor proteins and formation of endocytic multivesicular bodies (MVBs). Some CHMPs have both nuclear and cytoplasmic/vesicular distributions, and one such CHMP, CHMP1A (MIM 164010), is required for both MVB formation and regulation of cell cycle progression (Tsang et al., 2006 [PubMed 16730941]). charged multivesicular body protein 1B 57132 NA
HSPB6 ENSG00000004776 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation. heat shock protein family B (small) member 6 126393 NA
RAB11FIP4 ENSG00000131242 Proteins of the large Rab GTPase family (see RAB1A; MIM 179508) have regulatory roles in the formation, targeting, and fusion of intracellular transport vesicles. RAB11FIP4 is one of many proteins that interact with and regulate Rab GTPases (Hales et al., 2001 [PubMed 11495908]). RAB11 family interacting protein 4 84440 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
X_id name summary symbol query
83886 protease, serine 27 This gene is located within a large protease gene cluster on chromosome 16. It belongs to the group-1 subfamily of serine proteases. The encoded protein is a secreted tryptic serine protease and is expressed mainly in the pancreas. Alternative splicing results in multiple transcript variants. PRSS27 ENSG00000172382
4118 mal, T-cell differentiation protein The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. MAL ENSG00000172005
23650 tripartite motif containing 29 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. TRIM29 ENSG00000137699
7051 transglutaminase 1 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). TGM1 ENSG00000092295
6273 S100 calcium binding protein A2 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may have a tumor suppressor function. Chromosomal rearrangements and altered expression of this gene have been implicated in breast cancer. S100A2 ENSG00000196754
2706 gap junction protein beta 2 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. GJB2 ENSG00000165474
5753 protein tyrosine kinase 6 The protein encoded by this gene is a cytoplasmic nonreceptor protein kinase which may function as an intracellular signal transducer in epithelial tissues. Overexpression of this gene in mammary epithelial cells leads to sensitization of the cells to epidermal growth factor and results in a partially transformed phenotype. Expression of this gene has been detected at low levels in some breast tumors but not in normal breast tissue. The encoded protein has been shown to undergo autophosphorylation. Alternative splicing results in multiple transcript variants. PTK6 ENSG00000101213
11005 serine peptidase inhibitor, Kazal type 5 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. SPINK5 ENSG00000133710
147645 V-set and immunoglobulin domain containing 10 like NA VSIG10L ENSG00000186806
27076 LY6/PLAUR domain containing 3 NA LYPD3 ENSG00000124466
23508 tetratricopeptide repeat domain 9 This gene encodes a protein that contains three tetratricopeptide repeats. The gene has been shown to be hormonally regulated in breast cancer cells and may play a role in cancer cell invasion and metastasis. TTC9 ENSG00000133985
1382 cellular retinoic acid binding protein 2 This gene encodes a member of the retinoic acid (RA, a form of vitamin A) binding protein family and lipocalin/cytosolic fatty-acid binding protein family. The protein is a cytosol-to-nuclear shuttling protein, which facilitates RA binding to its cognate receptor complex and transfer to the nucleus. It is involved in the retinoid signaling pathway, and is associated with increased circulating low-density lipoprotein cholesterol. Alternatively spliced transcript variants encoding the same protein have been found for this gene. CRABP2 ENSG00000143320
ENSG00000271795 NA NA CTC-251D13.1 ENSG00000271795
375791 cysteine rich tail 1 NA CYSRT1 ENSG00000197191
123099 delta(4)-desaturase, sphingolipid 2 This gene encodes a bifunctional enzyme that is involved in the biosynthesis of phytosphingolipids in human skin and in other phytosphingolipid-containing tissues. This enzyme can act as a sphingolipid delta(4)-desaturase, and also as a sphingolipid C4-hydroxylase. DEGS2 ENSG00000168350
1476 cystatin B The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). CSTB ENSG00000160213
3880 keratin 19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. KRT19 ENSG00000171345
8000 prostate stem cell antigen This gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney, and stomach. This gene is up-regulated in a large proportion of prostate cancers and is also detected in cancers of the bladder and pancreas. This gene includes a polymorphism that results in an upstream start codon in some individuals; this polymorphism is thought to be associated with a risk for certain gastric and bladder cancers. Alternative splicing results in multiple transcript variants. PSCA ENSG00000167653
7074 T-cell lymphoma invasion and metastasis 1 NA TIAM1 ENSG00000156299
83543 allograft inflammatory factor 1 like NA AIF1L ENSG00000126878
1893 extracellular matrix protein 1 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. ECM1 ENSG00000143369
6279 S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. S100A8 ENSG00000143546
218 aldehyde dehydrogenase 3 family member A1 Aldehyde dehydrogenases oxidize various aldehydes to the corresponding acids. They are involved in the detoxification of alcohol-derived acetaldehyde and in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. The enzyme encoded by this gene forms a cytoplasmic homodimer that preferentially oxidizes aromatic and medium-chain (6 carbons or more) saturated and unsaturated aldehyde substrates. It is thought to promote resistance to UV and 4-hydroxy-2-nonenal-induced oxidative damage in the cornea. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Multiple alternatively spliced variants, encoding the same protein, have been identified. ALDH3A1 ENSG00000108602
57402 S100 calcium binding protein A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). S100A14 ENSG00000189334
84518 cornifelin NA CNFN ENSG00000105427
55686 melanoregulin NA MREG ENSG00000118242
ENSG00000249790 NA NA RP11-20D14.6 ENSG00000249790
2669 GTP binding protein overexpressed in skeletal muscle The protein encoded by this gene belongs to the RAD/GEM family of GTP-binding proteins. It is associated with the inner face of the plasma membrane and could play a role as a regulatory protein in receptor-mediated signal transduction. Alternative splicing occurs at this locus and two transcript variants encoding the same protein have been identified. GEM ENSG00000164949
6288 serum amyloid A1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. SAA1 ENSG00000173432
53905 dual oxidase 1 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. DUOX1 ENSG00000137857
5493 periplakin The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. PPL ENSG00000118898
29841 grainyhead like transcription factor 1 This gene encodes a member of the grainyhead family of transcription factors. The encoded protein can exist as a homodimer or can form heterodimers with sister-of-mammalian grainyhead or brother-of-mammalian grainyhead. This protein functions as a transcription factor during development. GRHL1 ENSG00000134317
ENSG00000242396 NA NA RP11-67L3.5 ENSG00000242396
53833 interleukin 20 receptor subunit beta IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]). IL20RB ENSG00000174564
6820 sulfotransferase family 2B member 1 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene sulfates dehydroepiandrosterone but not 4-nitrophenol, a typical substrate for the phenol and estrogen sulfotransferase subfamilies. Two alternatively spliced variants that encode different isoforms have been described. SULT2B1 ENSG00000088002
1475 cystatin A The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. CSTA ENSG00000121552
221692 phosphatase and actin regulator 1 The protein encoded by this gene is a member of the phosphatase and actin regulator family of proteins. This family member can bind actin and regulate the reorganization of the actin cytoskeleton. It plays a role in tubule formation and in endothelial cell survival. Polymorphisms in this gene are associated with susceptibility to myocardial infarction, coronary artery disease and cervical artery dissection. Alternative splicing of this gene results in multiple transcript variants. PHACTR1 ENSG00000112137
257000 tissue differentiation-inducing non-protein coding RNA This gene produces a spliced long non-coding RNA that is required for normal epidermal differentiation. This transcript regulates the expression of genes involved in the differentiation of epidermal tissue. Mutations in some of the genes targeted by this transcript have been implicated in epidermal skin diseases. TINCR ENSG00000223573
3040 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA2 ENSG00000188536
301 annexin A1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ANXA1 ENSG00000135046
90993 cAMP responsive element binding protein 3 like 1 The protein encoded by this gene is normally found in the membrane of the endoplasmic reticulum (ER). However, upon stress to the ER, the encoded protein is cleaved and the released cytoplasmic transcription factor domain translocates to the nucleus. There it activates the transcription of target genes by binding to box-B elements. CREB3L1 ENSG00000157613
115948 coiled-coil domain containing 151 This gene encodes a protein containing coiled-coil domains. The encoded protein functions in outer dynein arm assembly and is required for motile cilia function. Mutations in this gene result in primary ciliary dyskinesia. Alternative splicing results in multiple transcript variants encoding different isoforms. CCDC151 ENSG00000198003
2769 G protein subunit alpha 15 NA GNA15 ENSG00000060558
7464 coronin 2A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. This protein contains 5 WD repeats, and has a structural similarity with actin-binding proteins: the D. discoideum coronin and the human p57 protein, suggesting that this protein may also be an actin-binding protein that regulates cell motility. Alternative splicing of this gene generates 2 transcript variants. CORO2A ENSG00000106789
149428 BCL2/adenovirus E1B 19kD interacting protein like The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BNIPL ENSG00000163141
220963 solute carrier family 16 member 9 NA SLC16A9 ENSG00000165449
2524 fucosyltransferase 2 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. FUT2 ENSG00000176920
3039 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA1 ENSG00000206172
1401 C-reactive protein, pentraxin-related The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. CRP ENSG00000132693
8581 lymphocyte antigen 6 complex, locus D NA LY6D ENSG00000167656
140576 S100 calcium binding protein A16 NA S100A16 ENSG00000188643
80063 activating transcription factor 7 interacting protein 2 NA ATF7IP2 ENSG00000166669
3038 hyaluronan synthase 3 The protein encoded by this gene is involved in the synthesis of the unbranched glycosaminoglycan hyaluronan, or hyaluronic acid, which is a major constituent of the extracellular matrix. This gene is a member of the NODC/HAS gene family. Compared to the proteins encoded by other members of this gene family, this protein appears to be more of a regulator of hyaluronan synthesis. Alternative splicing results in multiple transcript variants. HAS3 ENSG00000103044
ENSG00000239556 NA NA AC004951.5 ENSG00000239556
220 aldehyde dehydrogenase 1 family member A3 This gene encodes an aldehyde dehydrogenase enzyme that uses retinal as a substrate. Mutations in this gene have been associated with microphthalmia, isolated 8, and expression changes have also been detected in tumor cells. Alternative splicing results in multiple transcript variants. ALDH1A3 ENSG00000184254
239 arachidonate 12-lipoxygenase, 12S type NA ALOX12 ENSG00000108839
285973 autophagy related 9B This gene functions in the regulation of autophagy, a lysosomal degradation pathway. This gene also functions as an antisense transcript in the posttranscriptional regulation of the endothelial nitric oxide synthase 3 gene, which has 3’ overlap with this gene on the opposite strand. Mutations in this gene and disruption of the autophagy process have been associated with multiple cancers. Alternative splicing results in multiple transcript variants. ATG9B ENSG00000181652
441869 ankyrin repeat domain 65 NA ANKRD65 ENSG00000235098
400759 guanylate binding protein 1 pseudogene 1 NA GBP1P1 ENSG00000225492
1030 cyclin-dependent kinase inhibitor 2B This gene lies adjacent to the tumor suppressor gene CDKN2A in a region that is frequently mutated and deleted in a wide variety of tumors. This gene encodes a cyclin-dependent kinase inhibitor, which forms a complex with CDK4 or CDK6, and prevents the activation of the CDK kinases, thus the encoded protein functions as a cell growth regulator that controls cell cycle G1 progression. The expression of this gene was found to be dramatically induced by TGF beta, which suggested its role in the TGF beta induced growth inhibition. Two alternatively spliced transcript variants of this gene, which encode distinct proteins, have been reported. CDKN2B ENSG00000147883
ENSG00000253520 NA NA RP11-798K23.5 ENSG00000253520
9120 solute carrier family 16 member 6 NA SLC16A6 ENSG00000108932
84842 4-hydroxyphenylpyruvate dioxygenase like NA HPDL ENSG00000186603
103752584 CCND2 antisense RNA 1 NA CCND2-AS1 ENSG00000256164
51195 Rap guanine nucleotide exchange factor like 1 NA RAPGEFL1 ENSG00000108352
ENSG00000258444 NA NA CTD-2201G16.1 ENSG00000258444
84283 transmembrane protein 79 NA TMEM79 ENSG00000163472
23138 NEDD4 binding protein 3 NA N4BP3 ENSG00000145911
384 arginase 2 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exists (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type II isoform encoded by this gene, is located in the mitochondria and expressed in extra-hepatic tissues, especially kidney. The physiologic role of this isoform is poorly understood; it is thought to play a role in nitric oxide and polyamine metabolism. Transcript variants of the type II gene resulting from the use of alternative polyadenylation sites have been described. ARG2 ENSG00000081181
ENSG00000224769 mucin 20, cell surface associated pseudogene 1 NA MUC20P1 ENSG00000224769
4625 myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. MYH7 ENSG00000092054
93233 coiled-coil domain containing 114 This gene encodes a coiled-coil domain-containing protein that is a component of the outer dynein arm docking complex in cilia cells. Mutations in this gene may cause primary ciliary dyskinesia 20. CCDC114 ENSG00000105479
171177 ras homolog family member V NA RHOV ENSG00000104140
6450 SH3 domain binding glutamate rich protein NA SH3BGR ENSG00000185437
ENSG00000268603 NA NA RP11-316O14.1 ENSG00000268603
3043 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB ENSG00000244734
6550 solute carrier family 9 member A3 The protein encoded by this gene is an epithelial brush border Na/H exchanger that uses an inward sodium ion gradient to expel acids from the cell. Defects in this gene are a cause of congenital secretory sodium diarrhea. Pseudogenes of this gene exist on chromosomes 10 and 22. SLC9A3 ENSG00000066230
81558 family with sequence similarity 117 member A NA FAM117A ENSG00000121104
6638 small nuclear ribonucleoprotein polypeptide N The protein encoded by this gene is one polypeptide of a small nuclear ribonucleoprotein complex and belongs to the snRNP SMB/SMN family. The protein plays a role in pre-mRNA processing, possibly tissue-specific alternative splicing events. Although individual snRNPs are believed to recognize specific nucleic acid sequences through RNA-RNA base pairing, the specific role of this family member is unknown. The protein arises from a bicistronic transcript that also encodes a protein identified as the SNRPN upstream reading frame (SNURF). Multiple transcription initiation sites have been identified and extensive alternative splicing occurs in the 5’ untranslated region. Additional splice variants have been described but sequences for the complete transcripts have not been determined. The 5’ UTR of this gene has been identified as an imprinting center. Alternative splicing or deletion caused by a translocation event in this paternally-expressed region is responsible for Angelman syndrome or Prader-Willi syndrome due to parental imprint switch failure. SNRPN ENSG00000128739
57210 solute carrier family 45 member 4 NA SLC45A4 ENSG00000022567
1824 desmocollin 2 This gene encodes a member of the desmocollin protein subfamily. Desmocollins, along with desmogleins, are cadherin-like transmembrane glycoproteins that are major components of the desmosome. Desmosomes are cell-cell junctions that help resist shearing forces and are found in high concentrations in cells subject to mechanical stress. This gene is found in a cluster with other desmocollin family members on chromosome 18. Mutations in this gene are associated with arrhythmogenic right ventricular dysplasia-11, and reduced protein expression has been described in several types of cancer. Alternative splicing results in multiple transcript variants. DSC2 ENSG00000134755
83538 tetratricopeptide repeat domain 25 NA TTC25 ENSG00000204815
79586 chondroitin polymerizing factor NA CHPF ENSG00000123989
7137 troponin I3, cardiac type Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). TNNI3 ENSG00000129991
64788 lipase maturation factor 1 The protein encoded by this gene resides in the endoplasmic reticulum, and is involved in the maturation and transport of lipoprotein lipase through the secretory pathway. Mutations in this gene are associated with combined lipase deficiency. Alternatively spliced transcript variants have been found for this gene. LMF1 ENSG00000260807
79007 dysbindin (dystrobrevin binding protein 1) domain containing 1 NA DBNDD1 ENSG00000003249
2243 fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. FGA ENSG00000171560
6665 SRY-box 15 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. SOX15 ENSG00000129194
113146 AHNAK nucleoprotein 2 NA AHNAK2 ENSG00000185567
ENSG00000249119 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 6 pseudogene 4 NA MTND6P4 ENSG00000249119
666 BCL2-related ovarian killer The protein encoded by this gene belongs to the BCL2 family, members of which form homo- or heterodimers, and act as anti- or proapoptotic regulators that are involved in a wide variety of cellular processes. Studies in rat show that this protein has restricted expression in reproductive tissues, interacts strongly with some antiapoptotic BCL2 proteins, not at all with proapoptotic BCL2 proteins, and induces apoptosis in transfected cells. Thus, this protein represents a proapoptotic member of the BCL2 family. BOK ENSG00000176720
ENSG00000234964 fatty acid binding protein 5 pseudogene 7 NA FABP5P7 ENSG00000234964
1286 collagen type IV alpha 4 chain This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR. COL4A4 ENSG00000081052
9796 phytanoyl-CoA 2-hydroxylase interacting protein NA PHYHIP ENSG00000168490
ENSG00000255390 NA NA RP11-732A19.5 ENSG00000255390
80832 apolipoprotein L4 The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein. APOL4 ENSG00000100336
8510 matrix metallopeptidase 23B This gene (MMP23B) encodes a member of the matrix metalloproteinase (MMP) family, and it is part of a duplicated region of chromosome 1p36.3. Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. This gene belongs to the more telomeric copy of the duplicated region. MMP23B ENSG00000189409
ENSG00000237624 3-oxoacid CoA-transferase 2 pseudogene 1 NA OXCT2P1 ENSG00000237624
8642 dachsous cadherin-related 1 This gene is a member of the cadherin superfamily whose members encode calcium-dependent cell-cell adhesion molecules. The encoded protein has a signal peptide, 27 cadherin repeat domains and a unique cytoplasmic region. This particular cadherin family member is expressed in fibroblasts but not in melanocytes or keratinocytes. The cell-cell adhesion of fibroblasts is thought to be necessary for wound healing. DCHS1 ENSG00000166341
4084 MAX dimerization protein 1 This gene encodes a member of the MYC/MAX/MAD network of basic helix-loop-helix leucine zipper transcription factors. The MYC/MAX/MAD transcription factors mediate cellular proliferation, differentiation and apoptosis. The encoded protein antagonizes MYC-mediated transcriptional activation of target genes by competing for the binding partner MAX and recruiting repressor complexes containing histone deacetylases. Mutations in this gene may play a role in acute leukemia, and the encoded protein is a potential tumor suppressor. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. MXD1 ENSG00000059728
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. 5996 ENSG00000090104 RGS1 regulator of G-protein signaling 1 NA
C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. 730 ENSG00000112936 C7 complement component 7 NA
N-methylation of endogenous and xenobiotic compounds is a major method by which they are degraded. This gene encodes an enzyme that N-methylates indoles such as tryptamine. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the downstream FAM188B (family with sequence similarity 188, member B) gene. 11185 ENSG00000241644 INMT indolethylamine N-methyltransferase NA
NA 1264 ENSG00000130176 CNN1 calponin 1 NA
Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. 10205 ENSG00000149573 MPZL2 myelin protein zero like 2 NA
Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. 7177 ENSG00000172236 TPSAB1 tryptase alpha/beta 1 NA
NA ENSG00000263065 ENSG00000263065 AF001548.6 NA NA
This gene encodes a prostaglandin transporter that is a member of the 12-membrane-spanning superfamily of transporters. The encoded protein may be involved in mediating the uptake and clearance of prostaglandins in numerous tissues. 6578 ENSG00000174640 SLCO2A1 solute carrier organic anion transporter family member 2A1 NA
Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 ENSG00000163017 ACTG2 actin, gamma 2, smooth muscle, enteric NA
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 ENSG00000133392 MYH11 myosin, heavy chain 11, smooth muscle NA
This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). 6366 ENSG00000137077 CCL21 C-C motif chemokine ligand 21 NA
This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. The cytokine encoded by this gene may play a role in normal lymphocyte recirculation and homing. It also plays an important role in trafficking of T cells in thymus, and in T cell and B cell migration to secondary lymphoid organs. It specifically binds to chemokine receptor CCR7. 6363 ENSG00000172724 CCL19 C-C motif chemokine ligand 19 NA
CHMP4C belongs to the chromatin-modifying protein/charged multivesicular body protein (CHMP) family. These proteins are components of ESCRT-III (endosomal sorting complex required for transport III), a complex involved in degradation of surface receptor proteins and formation of endocytic multivesicular bodies (MVBs). Some CHMPs have both nuclear and cytoplasmic/vesicular distributions, and one such CHMP, CHMP1A (MIM 164010), is required for both MVB formation and regulation of cell cycle progression (Tsang et al., 2006 [PubMed 16730941]). 92421 ENSG00000164695 CHMP4C charged multivesicular body protein 4C NA
This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. 4582 ENSG00000185499 MUC1 mucin 1, cell surface associated NA
This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. 2706 ENSG00000165474 GJB2 gap junction protein beta 2 NA
This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. 4969 ENSG00000106809 OGN osteoglycin NA
NA NA ENSG00000259716 NA NA TRUE
The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein. 80832 ENSG00000100336 APOL4 apolipoprotein L4 NA
Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 ENSG00000101335 MYL9 myosin light chain 9 NA
The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. This particular member is a developmentally regulated, neural-specific gene which plays an unspecified role in early embryogenesis. 9620 ENSG00000075275 CELSR1 cadherin EGF LAG seven-pass G-type receptor 1 NA
NA NA ENSG00000187990 NA NA TRUE
This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. 3856 ENSG00000170421 KRT8 keratin 8 NA
MYOC encodes the protein myocilin, which is believed to have a role in cytoskeletal function. MYOC is expressed in many occular tissues, including the trabecular meshwork, and was revealed to be the trabecular meshwork glucocorticoid-inducible response protein (TIGR). The trabecular meshwork is a specialized eye tissue essential in regulating intraocular pressure, and mutations in MYOC have been identified as the cause of hereditary juvenile-onset open-angle glaucoma. 4653 ENSG00000034971 MYOC myocilin NA
NA NA ENSG00000180672 NA NA TRUE
NA 255743 ENSG00000168743 NPNT nephronectin NA
NA ENSG00000269936 ENSG00000269936 RP11-394O4.5 NA NA
The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 ENSG00000107796 ACTA2 actin, alpha 2, smooth muscle, aorta NA
The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. 8425 ENSG00000090006 LTBP4 latent transforming growth factor beta binding protein 4 NA
The protein encoded by this gene is a member of the lipin family of proteins, and all family members share strong homology in their C-terminal region. This protein is thought to form hetero-oligomers with other lipin family members, while one family member, lipin 1, can also form homo-oligomers. This protein contains conserved motifs for phosphatidate phosphatase 1 (PAP1) activity as well as a domain that interacts with a transcriptional co-activator. Lipin complexes act in the cytoplasm to catalyze the dephosphorylation of phosphatidic acid to produce diacylglycerol, which is the precursor of both triglycerides and phospholipids. Lipin complexes are also thought to regulate gene expression as transcriptional co-activators in the nucleus. Alternative splicing results in multiple transcript variants. 64900 ENSG00000132793 LPIN3 lipin 3 NA
NA 27076 ENSG00000124466 LYPD3 LY6/PLAUR domain containing 3 NA
NA ENSG00000180139 ENSG00000180139 ACTA2-AS1 ACTA2 antisense RNA 1 NA
This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. 360 ENSG00000165272 AQP3 aquaporin 3 (Gill blood group) NA
NA 119587 ENSG00000121898 CPXM2 carboxypeptidase X (M14 family), member 2 NA
The protein encoded by this gene is a member of kinesin-like protein family. This family includes microtubule-dependent molecular motors that transport organelles within cells and move chromosomes during cell division. This protein has been shown to cross-bridge antiparallel microtubules and drive microtubule movement in vitro. Alternate splicing of this gene results in multiple transcript variants. 9493 ENSG00000137807 KIF23 kinesin family member 23 NA
NA ENSG00000232993 ENSG00000232993 RP11-334A14.5 NA NA
Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. 9423 ENSG00000065320 NTN1 netrin 1 NA
This gene encodes a member of the prolyl 3-hydroxylase subfamily of 2-oxo-glutarate-dependent dioxygenases. These enzymes play a critical role in collagen chain assembly, stability and cross-linking by catalyzing post-translational 3-hydroxylation of proline residues. Mutations in this gene are associated with nonsyndromic severe myopia with cataract and vitreoretinal degeneration, and downregulation of this gene may play a role in breast cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 55214 ENSG00000090530 P3H2 prolyl 3-hydroxylase 2 NA
The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. 3557 ENSG00000136689 IL1RN interleukin 1 receptor antagonist NA
NA 83483 ENSG00000130300 PLVAP plasmalemma vesicle associated protein NA
NA 115572 ENSG00000158246 FAM46B family with sequence similarity 46 member B NA
NA ENSG00000263335 ENSG00000263335 AF001548.5 NA NA
This gene encodes a member of the cingulin family. The encoded protein localizes to both adherens and tight cell-cell junctions and mediates junction assembly and maintenance by regulating the activity of the small GTPases RhoA and Rac1. Heterozygous chromosomal rearrangements resulting in association of the promoter for this gene with the aromatase gene are a cause of aromatase excess syndrome. Alternatively spliced transcript variants have been observed for this gene. 84952 ENSG00000128849 CGNL1 cingulin-like 1 NA
NA 348093 ENSG00000166831 RBPMS2 RNA binding protein with multiple splicing 2 NA
NA ENSG00000249007 ENSG00000249007 RP11-510N19.5 NA NA
The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. 5549 ENSG00000188783 PRELP proline and arginine rich end leucine rich repeat protein NA
NA ENSG00000233429 ENSG00000233429 HOTAIRM1 HOXA transcript antisense RNA, myeloid-specific 1 NA
NA 65124 ENSG00000198142 SOWAHC sosondowah ankyrin repeat domain family member C NA
NA ENSG00000253520 ENSG00000253520 RP11-798K23.5 NA NA
This gene encodes one of the members of the superfamily of potassium channel proteins containing two pore-forming P domains. This channel protein, considered an open rectifier, is widely expressed. It is stimulated by arachidonic acid, and inhibited by internal acidification and volatile anaesthetics. 9424 ENSG00000099337 KCNK6 potassium two pore domain channel subfamily K member 6 NA
NA 120376 ENSG00000214290 COLCA2 colorectal cancer associated 2 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. 29785 ENSG00000167600 CYP2S1 cytochrome P450 family 2 subfamily S member 1 NA
This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. 4060 ENSG00000139329 LUM lumican NA
NA ENSG00000237886 ENSG00000237886 NALT1 NOTCH1 associated lncRNA in T-cell acute lymphoblastic leukemia 1 NA
NA ENSG00000271133 ENSG00000271133 CTA-293F17.1 NA NA
This gene encodes a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. Methylation of this gene is a potential marker for the presence of colorectal cancer. 6423 ENSG00000145423 SFRP2 secreted frizzled related protein 2 NA
The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. 4256 ENSG00000111341 MGP matrix Gla protein NA
NA ENSG00000250786 ENSG00000250786 SNHG18 small nucleolar RNA host gene 18 NA
The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. 125 ENSG00000196616 ADH1B alcohol dehydrogenase 1B (class I), beta polypeptide NA
The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor functions as a receptor for extracellular ATP and ADP. In platelets binding to ADP leads to mobilization of intracellular calcium ions via activation of phospholipase C, a change in platelet shape, and probably to platelet aggregation. 5028 ENSG00000169860 P2RY1 purinergic receptor P2Y1 NA
NA NA ENSG00000268913 NA NA TRUE
Cell division control protein 42 (CDC42), a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg (binder of Rho GTPases) family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to CDC42 and regulate its function negatively. The encoded protein may inhibit c-Jun N-terminal kinase (JNK) independently of CDC42 binding. The protein may also play a role in septin organization and inducing pseudopodia formation in fibroblasts 148170 ENSG00000167617 CDC42EP5 CDC42 effector protein 5 NA
This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. 4070 ENSG00000184292 TACSTD2 tumor-associated calcium signal transducer 2 NA
This gene encodes a member of the IAP family of proteins that inhibit apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2, probably by interfering with activation of ICE-like proteases. The encoded protein inhibits apoptosis induced by serum deprivation but does not affect apoptosis resulting from exposure to menadione, a potent inducer of free radicals. It contains 3 baculovirus IAP repeats and a ring finger domain. Transcript variants encoding the same isoform have been identified. 330 ENSG00000023445 BIRC3 baculoviral IAP repeat containing 3 NA
This gene is a member of the secretory phospholipase A2 family. It is located in a tightly-linked cluster of secretory phospholipase A2 genes on chromosome 1. The encoded enzyme catalyzes the hydrolysis of membrane phospholipids to generate lysophospholipids and free fatty acids including arachidonic acid. It preferentially hydrolyzes linoleoyl-containing phosphatidylcholine substrates. Secretion of this enzyme is thought to induce inflammatory responses in neighboring cells. Alternatively spliced transcript variants have been found, but their full-length nature has not been determined. 5322 ENSG00000127472 PLA2G5 phospholipase A2 group V NA
NA 127435 ENSG00000174348 PODN podocan NA
This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. 1832 ENSG00000096696 DSP desmoplakin NA
Members of the band 4.1 protein superfamily, including EPB41L4A, are thought to regulate the interaction between the cytoskeleton and plasma membrane (Ishiguro et al., 2000 [PubMed 10874211]). 64097 ENSG00000129595 EPB41L4A erythrocyte membrane protein band 4.1 like 4A NA
NA 57664 ENSG00000105559 PLEKHA4 pleckstrin homology domain containing A4 NA
This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. This gene encodes a protein that binds ephrin-A ligands. Mutations in this gene are the cause of certain genetically-related cataract disorders. 1969 ENSG00000142627 EPHA2 EPH receptor A2 NA
The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. 10752 ENSG00000134121 CHL1 cell adhesion molecule L1 like NA
This gene encodes an iron containing glycoprotein which catalyzes the conversion of orthophosphoric monoester to alcohol and orthophosphate. It is the most basic of the acid phosphatases and is the only form not inhibited by L(+)-tartrate. 54 ENSG00000102575 ACP5 acid phosphatase 5, tartrate resistant NA
The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. 5320 ENSG00000188257 PLA2G2A phospholipase A2 group IIA NA
NA 51285 ENSG00000103710 RASL12 RAS like family 12 NA
This gene encodes a protein with similarity to a bovine microfibril-associated protein. The protein has binding specificities for both collagen and carbohydrate. It is thought to be an extracellular matrix protein which is involved in cell adhesion or intercellular interactions. The gene is located within the Smith-Magenis syndrome region. Two transcript variants encoding different isoforms have been found for this gene. 4239 ENSG00000166482 MFAP4 microfibrillar associated protein 4 NA
NA 2012 ENSG00000134531 EMP1 epithelial membrane protein 1 NA
NA 64065 ENSG00000112378 PERP PERP, TP53 apoptosis effector NA
This gene was first identified in a study of human esophageal squamous cell carcinoma tissues. Levels of both the message and protein are reduced in carcinoma samples. In adult human tissues, this gene is expressed in the the esophagus, stomach, small intestine, colon and placenta. Alternatively spliced transcript variants that encode the same protein have been identified. 84419 ENSG00000166920 C15orf48 chromosome 15 open reading frame 48 NA
NA ENSG00000272734 ENSG00000272734 ADIRF-AS1 ADIRF antisense RNA 1 NA
NA 79669 ENSG00000114529 C3orf52 chromosome 3 open reading frame 52 NA
VWA1 belongs to the von Willebrand factor (VWF; MIM 613160) A (VWFA) domain superfamily of extracellular matrix proteins and appears to play a role in cartilage structure and function (Fitzgerald et al., 2002 [PubMed 12062410]). 64856 ENSG00000179403 VWA1 von Willebrand factor A domain containing 1 NA
The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. 55504 ENSG00000127863 TNFRSF19 tumor necrosis factor receptor superfamily member 19 NA
This gene encodes a member of the chloride intracellular channel family of proteins. The gene is part of a large triplicated region found on chromosomes 1, 6, and 21. Alternative splicing results in multiple transcript variants encoding different isoforms. 54102 ENSG00000159212 CLIC6 chloride intracellular channel 6 NA
NA 100506314 ENSG00000247498 LOC100506314 uncharacterized LOC100506314 NA
The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. 3485 ENSG00000115457 IGFBP2 insulin like growth factor binding protein 2 NA
Aldehyde dehydrogenases oxidize various aldehydes to the corresponding acids. They are involved in the detoxification of alcohol-derived acetaldehyde and in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. The enzyme encoded by this gene forms a cytoplasmic homodimer that preferentially oxidizes aromatic and medium-chain (6 carbons or more) saturated and unsaturated aldehyde substrates. It is thought to promote resistance to UV and 4-hydroxy-2-nonenal-induced oxidative damage in the cornea. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Multiple alternatively spliced variants, encoding the same protein, have been identified. 218 ENSG00000108602 ALDH3A1 aldehyde dehydrogenase 3 family member A1 NA
NA 415117 ENSG00000178750 STX19 syntaxin 19 NA
The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. 5350 ENSG00000198523 PLN phospholamban NA
This gene encodes a member of the type 3 G protein-coupling receptor family, characterized by the signature 7-transmembrane domain motif. The encoded protein may be involved in interaction between retinoid acid and G protein signalling pathways. Retinoic acid plays a critical role in development, cellular growth, and differentiation. This gene may play a role in embryonic development and epithelial cell differentiation. 9052 ENSG00000013588 GPRC5A G protein-coupled receptor class C group 5 member A NA
This gene was identified as a retinoid acid (RA) receptor-responsive gene. It encodes a type 1 membrane protein. The expression of this gene is upregulated by tazarotene as well as by retinoic acid receptors. The expression of this gene is found to be downregulated in prostate cancer, which is caused by the methylation of its promoter and CpG island. Alternatively spliced transcript variant encoding distinct isoforms have been observed. 5918 ENSG00000118849 RARRES1 retinoic acid receptor responder 1 NA
NA 57228 ENSG00000170545 SMAGP small cell adhesion glycoprotein NA
This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1281 ENSG00000168542 COL3A1 collagen type III alpha 1 chain NA
NA 2810 ENSG00000175793 SFN stratifin NA
NA 123036 ENSG00000165929 TC2N tandem C2 domains, nuclear NA
The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance throughout the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin forms a complex with and functions as a regulatory subunit of CDK4 or CDK6, whose activity is required for cell cycle G1/S transition. This protein has been shown to interact with tumor suppressor protein Rb and the expression of this gene is regulated positively by Rb. Mutations, amplification and overexpression of this gene, which alters cell cycle progression, are observed frequently in a variety of tumors and may contribute to tumorigenesis. 595 ENSG00000110092 CCND1 cyclin D1 NA
The protein encoded by this gene belongs to the laminin family of secreted molecules. Laminins are heterotrimeric molecules that consist of alpha, beta, and gamma subunits that assemble through a coiled-coil domain. Laminins are essential for formation and function of the basement membrane and have additional functions in regulating cell migration and mechanical signal transduction. This gene encodes an alpha subunit and is responsive to several epithelial-mesenchymal regulators including keratinocyte growth factor, epidermal growth factor and insulin-like growth factor. Mutations in this gene have been identified as the cause of Herlitz type junctional epidermolysis bullosa and laryngoonychocutaneous syndrome. Alternative splicing and alternative promoter usage result in multiple transcript variants. 3909 ENSG00000053747 LAMA3 laminin subunit alpha 3 NA
This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. 6649 ENSG00000109610 SOD3 superoxide dismutase 3, extracellular NA
This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. 54869 ENSG00000131037 EPS8L1 EPS8 like 1 NA
This gene encodes a protein containing a leucine zipper and a transmembrane domain. This gene has been implicated in both Ellis-van Creveld syndrome (EvC) and Weyers acrodental dysostosis. 2121 ENSG00000072840 EVC EvC ciliary complex subunit 1 NA
This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. 5919 ENSG00000106538 RARRES2 retinoic acid receptor responder 2 NA
NA 81543 ENSG00000160233 LRRC3 leucine rich repeat containing 3 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
mal, T-cell differentiation protein 4118 The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. MAL ENSG00000172005 NA
chloride intracellular channel 3 9022 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 3 is a member of the p64 family and is predominantly localized in the nucleus and stimulates chloride ion channel activity. In addition, this protein may participate in cellular growth control, based on its association with ERK7, a member of the MAP kinase family. CLIC3 ENSG00000169583 NA
apoptosis-associated tyrosine kinase 9625 The protein encoded by this gene contains a tyrosine kinase domain at the N-terminus and a proline-rich domain at the C-terminus. This gene is induced during apoptosis, and expression of this gene may be a necessary pre-requisite for the induction of growth arrest and/or apoptosis of myeloid precursor cells. This gene has been shown to produce neuronal differentiation in a neuroblastoma cell line. Two transcript variants encoding different isoforms have been found for this gene. AATK ENSG00000181409 NA
lysozyme 4069 This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. LYZ ENSG00000090382 NA
ribonuclease A family member 2 6036 The protein encoded by this gene is a non-secretory ribonuclease that belongs to the pancreatic ribonuclease family, a subset of the ribonuclease A superfamily. The protein antimicrobial activity against viruses. RNASE2 ENSG00000169385 NA
small integral membrane protein 5 643008 NA SMIM5 ENSG00000204323 NA
NA ENSG00000257764 NA RP11-1143G9.4 ENSG00000257764 NA
cytidine deaminase 978 This gene encodes an enzyme involved in pyrimidine salvaging. The encoded protein forms a homotetramer that catalyzes the irreversible hydrolytic deamination of cytidine and deoxycytidine to uridine and deoxyuridine, respectively. It is one of several deaminases responsible for maintaining the cellular pyrimidine pool. Mutations in this gene are associated with decreased sensitivity to the cytosine nucleoside analogue cytosine arabinoside used in the treatment of certain childhood leukemias. CDA ENSG00000158825 NA
PDZ and LIM domain 4 8572 This gene encodes a protein which may be involved in bone development. Mutations in this gene are associated with susceptibility to osteoporosis. PDLIM4 ENSG00000131435 NA
ChaC glutathione specific gamma-glutamylcyclotransferase 1 79094 NA CHAC1 ENSG00000128965 NA
CDC42 effector protein 5 148170 Cell division control protein 42 (CDC42), a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg (binder of Rho GTPases) family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to CDC42 and regulate its function negatively. The encoded protein may inhibit c-Jun N-terminal kinase (JNK) independently of CDC42 binding. The protein may also play a role in septin organization and inducing pseudopodia formation in fibroblasts CDC42EP5 ENSG00000167617 NA
leukemia inhibitory factor 3976 The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. LIF ENSG00000128342 NA
collagen type I alpha 1 1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A1 ENSG00000108821 NA
family with sequence similarity 83 member D 81610 NA FAM83D ENSG00000101447 NA
AE binding protein 1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AEBP1 ENSG00000106624 NA
tubulin beta 6 class V 84617 NA TUBB6 ENSG00000176014 NA
transmembrane protein 52 339456 NA TMEM52 ENSG00000178821 NA
immunoglobulin superfamily member 6 10261 NA IGSF6 ENSG00000140749 NA
glycoprotein hormones, alpha polypeptide 1081 The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. CGA ENSG00000135346 NA
RAP1 GTPase activating protein 5909 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. RAP1GAP ENSG00000076864 NA
ATPase phospholipid transporting 8B1 5205 This gene encodes a member of the P-type cation transport ATPase family, which belongs to the subfamily of aminophospholipid-transporting ATPases. The aminophospholipid translocases transport phosphatidylserine and phosphatidylethanolamine from one side of a bilayer to another. Mutations in this gene may result in progressive familial intrahepatic cholestasis type 1 and in benign recurrent intrahepatic cholestasis. ATP8B1 ENSG00000081923 NA
transmembrane 4 L six family member 1 4071 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface antigen and is highly expressed in different carcinomas. TM4SF1 ENSG00000169908 NA
colony stimulating factor 3 receptor 1441 The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. CSF3R ENSG00000119535 NA
dermatopontin 1805 Dermatopontin is an extracellular matrix protein with possible functions in cell-matrix interactions and matrix assembly. The protein is found in various tissues and many of its tyrosine residues are sulphated. Dermatopontin is postulated to modify the behavior of TGF-beta through interaction with decorin. DPT ENSG00000143196 NA
GTP binding protein overexpressed in skeletal muscle 2669 The protein encoded by this gene belongs to the RAD/GEM family of GTP-binding proteins. It is associated with the inner face of the plasma membrane and could play a role as a regulatory protein in receptor-mediated signal transduction. Alternative splicing occurs at this locus and two transcript variants encoding the same protein have been identified. GEM ENSG00000164949 NA
alpha 1,4-galactosyltransferase 53947 The protein encoded by this gene catalyzes the transfer of galactose to lactosylceramide to form globotriaosylceramide, which has been identified as the P(k) antigen of the P blood group system. This protein, a type II membrane protein found in the Golgi, is also required for the synthesis of the bacterial verotoxins receptor. Alternatively spliced transcript variants have been found for this gene. A4GALT ENSG00000128274 NA
CD109 molecule 135228 This gene encodes a glycosyl phosphatidylinositol (GPI)-linked glycoprotein that localizes to the surface of platelets, activated T-cells, and endothelial cells. The protein binds to and negatively regulates signalling by transforming growth factor beta (TGF-beta). Multiple transcript variants encoding different isoforms have been found for this gene. CD109 ENSG00000156535 NA
tenascin C 3371 This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. TNC ENSG00000041982 NA
filamin binding LIM protein 1 54751 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. FBLIM1 ENSG00000162458 NA
elastin microfibril interfacer 1 11117 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. EMILIN1 ENSG00000138080 NA
peptidylprolyl isomerase C 5480 The protein encoded by this gene is a member of the peptidyl-prolyl cis-trans isomerase (PPIase)) family. PPIases catalyze the cis-trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. Similar to other PPIases, this protein can bind immunosuppressant cyclosporin A. PPIC ENSG00000168938 NA
insulin like growth factor binding protein 4 3487 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. IGFBP4 ENSG00000141753 NA
BTG family member 3 10950 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein might play a role in neurogenesis in the central nervous system. Two transcript variants encoding different isoforms have been found for this gene. BTG3 ENSG00000154640 NA
microfibrillar associated protein 4 4239 This gene encodes a protein with similarity to a bovine microfibril-associated protein. The protein has binding specificities for both collagen and carbohydrate. It is thought to be an extracellular matrix protein which is involved in cell adhesion or intercellular interactions. The gene is located within the Smith-Magenis syndrome region. Two transcript variants encoding different isoforms have been found for this gene. MFAP4 ENSG00000166482 NA
trophoblast glycoprotein 7162 This gene encodes a leucine-rich transmembrane glycoprotein that may be involved in cell adhesion. The encoded protein is an oncofetal antigen that is specific to trophoblast cells. In adults this protein is highly expressed in many tumor cells and is associated with poor clinical outcome in numerous cancers. Alternate splicing in the 5’ UTR results in multiple transcript variants that encode the same protein. TPBG ENSG00000146242 NA
pleckstrin homology like domain family A member 2 7262 This gene is located in a cluster of imprinted genes on chromosome 11p15.5, which is considered to be an important tumor suppressor gene region. Alterations in this region may be associated with the Beckwith-Wiedemann syndrome, Wilms tumor, rhabdomyosarcoma, adrenocortical carcinoma, and lung, ovarian, and breast cancer. This gene has been shown to be imprinted, with preferential expression from the maternal allele in placenta and liver. PHLDA2 ENSG00000181649 NA
collagen type V alpha 1 1289 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. The encoded procollagen protein occurs commonly as the heterotrimer pro-alpha1(V)-pro-alpha1(V)-pro-alpha2(V). Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. Alternative splicing of this gene results in multiple transcript variants. COL5A1 ENSG00000130635 NA
TOX high mobility group box family member 2 84969 NA TOX2 ENSG00000124191 NA
matrix metallopeptidase 23B 8510 This gene (MMP23B) encodes a member of the matrix metalloproteinase (MMP) family, and it is part of a duplicated region of chromosome 1p36.3. Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. This gene belongs to the more telomeric copy of the duplicated region. MMP23B ENSG00000189409 NA
protein phosphatase 1 regulatory inhibitor subunit 1A 5502 NA PPP1R1A ENSG00000135447 NA
prolactin 5617 This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. PRL ENSG00000172179 NA
collagen type XV alpha 1 chain 1306 This gene encodes the alpha chain of type XV collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Type XV collagen has a wide tissue distribution but the strongest expression is localized to basement membrane zones so it may function to adhere basement membranes to underlying connective tissue stroma. The proteolytically produced C-terminal fragment of type XV collagen is restin, a potentially antiangiogenic protein that is closely related to endostatin. Mouse studies have shown that collagen XV deficiency is associated with muscle and microvessel deterioration. COL15A1 ENSG00000204291 NA
small nucleolar RNA, H/ACA box 73B ENSG00000200087 NA SNORA73B ENSG00000200087 NA
interleukin 12A 3592 This gene encodes a subunit of a cytokine that acts on T and natural killer cells, and has a broad array of biological activities. The cytokine is a disulfide-linked heterodimer composed of the 35-kD subunit encoded by this gene, and a 40-kD subunit that is a member of the cytokine receptor family. This cytokine is required for the T-cell-independent induction of interferon (IFN)-gamma, and is important for the differentiation of both Th1 and Th2 cells. The responses of lymphocytes to this cytokine are mediated by the activator of transcription protein STAT4. Nitric oxide synthase 2A (NOS2A/NOS2) is found to be required for the signaling process of this cytokine in innate immunity. IL12A ENSG00000168811 NA
atypical chemokine receptor 3 57007 This gene encodes a member of the G-protein coupled receptor family. Although this protein was earlier thought to be a receptor for vasoactive intestinal peptide (VIP), it is now considered to be an orphan receptor, in that its endogenous ligand has not been identified. The protein is also a coreceptor for human immunodeficiency viruses (HIV). Translocations involving this gene and HMGA2 on chromosome 12 have been observed in lipomas. ACKR3 ENSG00000144476 NA
retinoic acid receptor responder 2 5919 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. RARRES2 ENSG00000106538 NA
collagen type V alpha 2 chain 1290 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. COL5A2 ENSG00000204262 NA
cysteine and glycine rich protein 2 1466 CSRP2 is a member of the CSRP family of genes, encoding a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. CRP2 contains two copies of the cysteine-rich amino acid sequence motif (LIM) with putative zinc-binding activity, and may be involved in regulating ordered cell growth. Other genes in the family include CSRP1 and CSRP3. Alternative splicing results in multiple transcript variants. CSRP2 ENSG00000175183 NA
calponin 3 1266 This gene encodes a protein with a markedly acidic C terminus; the basic N-terminus is highly homologous to the N-terminus of a related gene, CNN1. Members of the CNN gene family all contain similar tandemly repeated motifs. This encoded protein is associated with the cytoskeleton but is not involved in contraction. CNN3 ENSG00000117519 NA
collagen type XII alpha 1 chain 1303 This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. COL12A1 ENSG00000111799 NA
PDZ domain containing ring finger 3 23024 This gene encodes a member of the LNX (Ligand of Numb Protein-X) family of RING-type ubiquitin E3 ligases. This protein may function in vascular morphogenesis and the differentiation of adipocytes, osteoblasts and myoblasts. This protein may be targeted for degradation by the human papilloma virus E6 protein. Alternative splicing results in multiple transcript variants. PDZRN3 ENSG00000121440 NA
oleoyl-ACP hydrolase 55301 NA OLAH ENSG00000152463 NA
NOTCH-regulated ankyrin repeat protein 441478 NA NRARP ENSG00000198435 NA
endothelin converting enzyme 2 9718 This gene encodes a member of the M13 family, which includes type 2 integral membrane metallopeptidases. The encoded enzyme is a membrane-bound zinc-dependent metalloprotease. The enzyme catalyzes the cleavage of big endothelin to produce the vasoconstrictor endothelin-1, and plays a role in the processing of several neuroendocrine peptides. It may also have methyltransferase activity. Alternative splicing results in multiple transcript variants. ECE2 ENSG00000145194 NA
delta like canonical Notch ligand 1 28514 DLL1 is a human homolog of the Notch Delta ligand and is a member of the delta/serrate/jagged family. It plays a role in mediating cell fate decisions during hematopoiesis. It may play a role in cell-to-cell communication. DLL1 ENSG00000198719 NA
prostaglandin-endoperoxide synthase 2 5743 Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is the key enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2, which differ in their regulation of expression and tissue distribution. This gene encodes the inducible isozyme. It is regulated by specific stimulatory events, suggesting that it is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis. PTGS2 ENSG00000073756 NA
naked cuticle homolog 2 85409 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NKD2 ENSG00000145506 NA
purinergic receptor P2Y12 64805 The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor is involved in platelet aggregation, and is a potential target for the treatment of thromboembolisms and other clotting disorders. Mutations in this gene are implicated in bleeding disorder, platelet type 8 (BDPLT8). Alternative splicing results in multiple transcript variants of this gene. P2RY12 ENSG00000169313 NA
WWTR1 antisense RNA 1 100128025 NA WWTR1-AS1 ENSG00000241313 NA
NA ENSG00000259712 NA CTD-2184D3.5 ENSG00000259712 NA
ATP2A1 antisense RNA 1 100289092 NA ATP2A1-AS1 ENSG00000260442 NA
fibulin 1 2192 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. FBLN1 ENSG00000077942 NA
prostaglandin I2 (prostacyclin) receptor (IP) 5739 The protein encoded by this gene is a member of the G-protein coupled receptor family 1 and has been shown to be a receptor for prostacyclin. Prostacyclin, the major product of cyclooxygenase in macrovascular endothelium, elicits a potent vasodilation and inhibition of platelet aggregation through binding to this receptor. PTGIR ENSG00000160013 NA
vascular cell adhesion molecule 1 7412 This gene is a member of the Ig superfamily and encodes a cell surface sialoglycoprotein expressed by cytokine-activated endothelium. This type I membrane protein mediates leukocyte-endothelial cell adhesion and signal transduction, and may play a role in the development of artherosclerosis and rheumatoid arthritis. Three alternatively spliced transcripts encoding different isoforms have been described for this gene. VCAM1 ENSG00000162692 NA
DNA damage inducible transcript 4 like 115265 NA DDIT4L ENSG00000145358 NA
chromosome 8 open reading frame 4 56892 This gene encodes a small, monomeric, predominantly unstructured protein that functions as a positive regulator of the Wnt/beta-catenin signaling pathway. This protein interacts with a repressor of beta-catenin mediated transcription at nuclear speckles. It is thought to competitively block interactions of the repressor with beta-catenin, resulting in up-regulation of beta-catenin target genes. The encoded protein may also play a role in the NF-kappaB and ERK1/2 signaling pathways. Expression of this gene may play a role in the proliferation of several types of cancer including thyroid cancer, breast cancer and hematological malignancies. C8orf4 ENSG00000176907 NA
immediate early response 3 8870 This gene functions in the protection of cells from Fas- or tumor necrosis factor type alpha-induced apoptosis. Partially degraded and unspliced transcripts are found after virus infection in vitro, but these transcripts are not found in vivo and do not generate a valid protein. IER3 ENSG00000137331 NA
coiled-coil domain containing 102B 79839 NA CCDC102B ENSG00000150636 NA
serpin family A member 1 5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. SERPINA1 ENSG00000197249 NA
transmembrane protein 266 123591 NA TMEM266 ENSG00000169758 NA
filamin A interacting protein 1 like 11259 NA FILIP1L ENSG00000168386 NA
hes family bHLH transcription factor 4 57801 NA HES4 ENSG00000188290 NA
fibrillin 1 2200 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. FBN1 ENSG00000166147 NA
SH2 domain containing adaptor protein B 6461 NA SHB ENSG00000107338 NA
PPFIA binding protein 1 8496 The protein encoded by this gene is a member of the LAR protein-tyrosine phosphatase-interacting protein (liprin) family. Liprins interact with members of LAR family of transmembrane protein tyrosine phosphatases, which are known to be important for axon guidance and mammary gland development. It has been proposed that liprins are multivalent proteins that form complex structures and act as scaffolds for the recruitment and anchoring of LAR family of tyrosine phosphatases. This protein was found to interact with S100A4, a calcium-binding protein related to tumor invasiveness and metastasis. In vitro experiment demonstrated that the interaction inhibited the phosphorylation of this protein by protein kinase C and protein kinase CK2. Alternatively spliced transcript variants encoding distinct isoforms have been reported. PPFIBP1 ENSG00000110841 NA
vasorin 114990 NA VASN ENSG00000168140 NA
lumican 4060 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. LUM ENSG00000139329 NA
NA ENSG00000269680 NA CTD-3128G10.6 ENSG00000269680 NA
collagen type I alpha 2 chain 1278 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A2 ENSG00000164692 NA
neuronatin 4826 The protein encoded by this gene is a proteolipid that may be involved in the regulation of ion channels during brain development. The encoded protein may also play a role in forming and maintaining the structure of the nervous system. This gene is found within an intron of another gene, bladder cancer associated protein, but on the opposite strand. This gene is imprinted and is expressed only from the paternal allele. NNAT ENSG00000053438 NA
mitogen-activated protein kinase 12 6300 Activation of members of the mitogen-activated protein kinase family is a major mechanism for transduction of extracellular signals. Stress-activated protein kinases are one subclass of MAP kinases. The protein encoded by this gene functions as a signal transducer during differentiation of myoblasts to myotubes. MAPK12 ENSG00000188130 NA
ADAM metallopeptidase with thrombospondin type 1 motif 7 11173 The protein encoded by this gene is a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motifs) family. Members of this family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme contains two C-terminal TS motifs and may regulate vascular smooth muscle cell (VSMC) migration. Mutations in this gene may be associated with susceptibility to coronary artery disease. ADAMTS7 ENSG00000136378 NA
hes family bHLH transcription factor 1 3280 This protein belongs to the basic helix-loop-helix family of transcription factors. It is a transcriptional repressor of genes that require a bHLH protein for their transcription. The protein has a particular type of basic domain that contains a helix interrupting protein that binds to the N-box rather than the canonical E-box. HES1 ENSG00000114315 NA
NA ENSG00000249996 NA RP11-359P5.1 ENSG00000249996 NA
matrix metallopeptidase 2 4313 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. MMP2 ENSG00000087245 NA
NA ENSG00000242590 NA RP11-54O7.14 ENSG00000242590 NA
glycoprotein nmb 10457 The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. GPNMB ENSG00000136235 NA
agrin 375790 This gene encodes one of several proteins that are critical in the development of the neuromuscular junction (NMJ), as identified in mouse knock-out studies. The encoded protein contains several laminin G, Kazal type serine protease inhibitor, and epidermal growth factor domains. Additional post-translational modifications occur to add glycosaminoglycans and disulfide bonds. In one family with congenital myasthenic syndrome affecting limb-girdle muscles, a mutation in this gene was found. Alternative splicing results in multiple transcript variants encoding different isoforms. AGRN ENSG00000188157 NA
G protein subunit gamma 12 55970 NA GNG12 ENSG00000172380 NA
collagen type III alpha 1 chain 1281 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL3A1 ENSG00000168542 NA
semaphorin 3F 6405 This gene encodes a member of the semaphorin III family of secreted signaling proteins that are involved in axon guidance during neuronal development. The encoded protein contains an N-terminal Sema domain, an immunoglobulin loop and a C-terminal basic domain. This gene is expressed by the endothelial cells where it was found to act in an autocrine fashion to induce apoptosis, inhibit cell proliferation and survival, and function as an anti-tumorigenic agent. Alternative splicing results in multiple transcript variants encoding different isoforms. SEMA3F ENSG00000001617 NA
WW domain containing transcription regulator 1 25937 NA WWTR1 ENSG00000018408 NA
PDZ and LIM domain 1 9124 This gene encodes a member of the enigma protein family. The protein contains two protein interacting domains, a PDZ domain at the amino terminal end and one to three LIM domains at the carboxyl terminal. It is a cytoplasmic protein associated with the cytoskeleton. The protein may function as an adapter to bring other LIM-interacting proteins to the cytoskeleton. Pseudogenes associated with this gene are located on chromosomes 3, 14 and 17. PDLIM1 ENSG00000107438 NA
potassium voltage-gated channel subfamily J member 12 3768 This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. KCNJ12 ENSG00000184185 NA
NA NA NA NA ENSG00000255905 TRUE
UDP-glucose 6-dehydrogenase 7358 The protein encoded by this gene converts UDP-glucose to UDP-glucuronate and thereby participates in the biosynthesis of glycosaminoglycans such as hyaluronan, chondroitin sulfate, and heparan sulfate. These glycosylated compounds are common components of the extracellular matrix and likely play roles in signal transduction, cell migration, and cancer growth and metastasis. The expression of this gene is up-regulated by transforming growth factor beta and down-regulated by hypoxia. Alternative splicing results in multiple transcript variants. UGDH ENSG00000109814 NA
SPARC related modular calcium binding 2 64094 This gene encodes a member of the SPARC family (secreted protein acidic and rich in cysteine/osteonectin/BM-40), which are highly expressed during embryogenesis and wound healing. The gene product is a matricellular protein which promotes matrix assembly and can stimulate endothelial cell proliferation and migration, as well as angiogenic activity. Associated with pulmonary function, this secretory gene product contains a Kazal domain, two thymoglobulin type-1 domains, and two EF-hand calcium-binding domains. The encoded protein may serve as a target for controlling angiogenesis in tumor growth and myocardial ischemia. Alternative splicing results in multiple transcript variants. SMOC2 ENSG00000112562 NA
retinol binding protein 1 5947 This gene encodes the carrier protein involved in the transport of retinol (vitamin A alcohol) from the liver storage site to peripheral tissue. Vitamin A is a fat-soluble vitamin necessary for growth, reproduction, differentiation of epithelial tissues, and vision. Multiple transcript variants encoding different isoforms have been found for this gene. RBP1 ENSG00000114115 NA
NA ENSG00000266101 NA RP5-906A24.2 ENSG00000266101 NA
neuralized E3 ubiquitin protein ligase 1 9148 NA NEURL1 ENSG00000107954 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Transpose the matrix to ensure sparse factors

##  Voom counts transpose

#Expected complete Log likelihood at iteration 100: -8.18694e+07
#Marginal log likelihood at iteration 100: inf
#Residual variance at iteration 100: 1.72648
#Residual sum of squares at iteration 100: 6.53386e+07

##  Sqrt counts transpose

# Expected complete Log likelihood at iteration 100: -3.9495e+08
# Marginal log likelihood at iteration 100: -inf
# Residual variance at iteration 100: 638.648
# Residual sum of squares at iteration 100: 2.35575e+10

## counts transpose

GTEx 2013 Factor analysis (sparse factors: sqrt counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013_transpose/sqrt_counts_gtex/gtex_sqrt_counts_transpose_lambda.out");
f_out <- read.table("../sfa_outputs/GTEX2013_transpose/sqrt_counts_gtex/gtex_sqrt_counts_transpose_F.out");

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;


indices_mat <- SFA.ExtractTopFeatures(lambda_out, top_features = 100, options = "min",mult.annotate = TRUE)


gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

SFA loadings plot

samples_id <- read.table("../sfa_inputs/samples_id.txt");

tissue_labels <- vector("numeric", NROW(samples_id))
tissue_labels <- samples_id[ ,3]

tissue_levels <- unique(tissue_labels);


cumsum_val <- c(1,cumsum(as.numeric(table(tissue_labels))))
cumsum_low <- cumsum_val[1:(length(cumsum_val)-1)]
cumsum_high <- cumsum_val[2:(length(cumsum_val))];
cumsum_mean <- 0.5*(cumsum_low+cumsum_high)

for(k in 1:20){
png(paste0("../sfa_outputs/GTEX2013_transpose/sfa-figures/sqrt_counts_sparse_fac_loadings/gtex_sfa_loadings_",k,".png"), width=4, height=4, units="in", res=600)
par(mar=c(6,3,1,1))
par(mar=c(10,3,2,2))
barplot(t(f_out)[,k], axisnames=F,space=0,border=NA,
        main=paste0("SFA on gtex expression: loading:", k),
        las=1, cex.axis=0.3,cex.main=0.4,
        ylim=c(min(f_out[k,]),max(f_out[k,])))
axis(1,at=cumsum_mean,unique(tissue_labels),las=2, cex.axis=0.3);
abline(v=cumsum_high)
dev.off()
}

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query notfound
4629 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 ENSG00000133392 NA
1634 decorin This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. DCN ENSG00000011465 NA
60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB ENSG00000075624 NA
1293 collagen type VI alpha 3 chain This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. COL6A3 ENSG00000163359 NA
6678 secreted protein acidic and cysteine rich This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. SPARC ENSG00000113140 NA
3936 lymphocyte cytosolic protein 1 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. Plastin 1 (otherwise known as Fimbrin) is a third distinct plastin isoform which is specifically expressed at high levels in the small intestine. The L isoform is expressed only in hemopoietic cell lineages, while the T isoform has been found in all other normal cells of solid tissues that have replicative potential (fibroblasts, endothelial cells, epithelial cells, melanocytes, etc.). However, L-plastin has been found in many types of malignant human cells of non-hemopoietic origin suggesting that its expression is induced accompanying tumorigenesis in solid tissues. LCP1 ENSG00000136167 NA
7805 lysosomal protein transmembrane 5 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. LAPTM5 ENSG00000162511 NA
1465 cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 ENSG00000159176 NA
5806 pentraxin 3 NA PTX3 ENSG00000163661 NA
347 apolipoprotein D This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. APOD ENSG00000189058 NA
567 beta-2-microglobulin This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. B2M ENSG00000166710 NA
72 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 ENSG00000163017 NA
963 CD53 molecule The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. CD53 ENSG00000143119 NA
1278 collagen type I alpha 2 chain This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A2 ENSG00000164692 NA
6876 transgelin The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. TAGLN ENSG00000149591 NA
57124 CD248 molecule NA CD248 ENSG00000174807 NA
6653 sortilin-related receptor, L(DLR class) A repeats containing This gene encodes a mosaic protein that belongs to at least two families: the vacuolar protein sorting 10 (VPS10) domain-containing receptor family, and the low density lipoprotein receptor (LDLR) family. The encoded protein also contains fibronectin type III repeats and an epidermal growth factor repeat. The encoded preproprotein is proteolytically processed to generate the mature receptor, which likely plays roles in endocytosis and sorting. Mutations in this gene may be associated with Alzheimer’s disease. SORL1 ENSG00000137642 NA
1303 collagen type XII alpha 1 chain This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. COL12A1 ENSG00000111799 NA
1410 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB ENSG00000109846 NA
58 actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ACTA1 ENSG00000143632 NA
25802 leiomodin 1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. LMOD1 ENSG00000163431 NA
11151 coronin 1A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Alternative splicing results in multiple transcript variants. A related pseudogene has been defined on chromosome 16. CORO1A ENSG00000102879 NA
65009 NDRG family member 4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. NDRG4 ENSG00000103034 NA
5777 protein tyrosine phosphatase, non-receptor type 6 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. N-terminal part of this PTP contains two tandem Src homolog (SH2) domains, which act as protein phospho-tyrosine binding domains, and mediate the interaction of this PTP with its substrates. This PTP is expressed primarily in hematopoietic cells, and functions as an important regulator of multiple signaling pathways in hematopoietic cells. This PTP has been shown to interact with, and dephosphorylate a wide spectrum of phospho-proteins involved in hematopoietic cell signaling. Multiple alternatively spliced variants of this gene, which encode distinct isoforms, have been reported. PTPN6 ENSG00000111679 NA
3107 major histocompatibility complex, class I, C HLA-C belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domain, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Over one hundred HLA-C alleles have been described HLA-C ENSG00000204525 NA
5552 serglycin This gene encodes a protein best known as a hematopoietic cell granule proteoglycan. Proteoglycans stored in the secretory granules of many hematopoietic cells also contain a protease-resistant peptide core, which may be important for neutralizing hydrolytic enzymes. This encoded protein was found to be associated with the macromolecular complex of granzymes and perforin, which may serve as a mediator of granule-mediated apoptosis. Two transcript variants, only one of them protein-coding, have been found for this gene. SRGN ENSG00000122862 NA
2012 epithelial membrane protein 1 NA EMP1 ENSG00000134531 NA
4313 matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. MMP2 ENSG00000087245 NA
4035 LDL receptor related protein 1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. LRP1 ENSG00000123384 NA
133 adrenomedullin The protein encoded by this gene is a preprohormone which is cleaved to form two biologically active peptides, adrenomedullin and proadrenomedullin N-terminal 20 peptide. Adrenomedullin is a 52 aa peptide with several functions, including vasodilation, regulation of hormone secretion, promotion of angiogenesis, and antimicrobial activity. The antimicrobial activity is antibacterial, as the peptide has been shown to kill E. coli and S. aureus at low concentration. ADM ENSG00000148926 NA
1264 calponin 1 NA CNN1 ENSG00000130176 NA
3983 actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. ABLIM1 ENSG00000099204 NA
2335 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. FN1 ENSG00000115414 NA
9516 lipopolysaccharide induced TNF factor Lipopolysaccharide is a potent stimulator of monocytes and macrophages, causing secretion of tumor necrosis factor-alpha (TNF-alpha) and other inflammatory mediators. This gene encodes lipopolysaccharide-induced TNF-alpha factor, which is a DNA-binding protein and can mediate the TNF-alpha expression by direct binding to the promoter region of the TNF-alpha gene. The transcription of this gene is induced by tumor suppressor p53 and has been implicated in the p53-induced apoptotic pathway. Mutations in this gene cause Charcot-Marie-Tooth disease type 1C (CMT1C) and may be involved in the carcinogenesis of extramammary Paget’s disease (EMPD). Multiple alternatively spliced transcript variants have been found for this gene. LITAF ENSG00000189067 NA
1675 complement factor D This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. CFD ENSG00000197766 NA
6695 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. SPOCK1 ENSG00000152377 NA
10335 murine retrovirus integration site 1 homolog This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. MRVI1 ENSG00000072952 NA
4026 LIM domain containing preferred translocation partner in lipoma This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. LPP ENSG00000145012 NA
151887 coiled-coil domain containing 80 NA CCDC80 ENSG00000091986 NA
3059 hematopoietic cell-specific Lyn substrate 1 NA HCLS1 ENSG00000180353 NA
284119 polymerase I and transcript release factor This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. PTRF ENSG00000177469 NA
4060 lumican This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. LUM ENSG00000139329 NA
23550 pleckstrin and Sec7 domain containing 4 NA PSD4 ENSG00000125637 NA
2268 FGR proto-oncogene, Src family tyrosine kinase This gene is a member of the Src family of protein tyrosine kinases (PTKs). The encoded protein contains N-terminal sites for myristylation and palmitylation, a PTK domain, and SH2 and SH3 domains which are involved in mediating protein-protein interactions with phosphotyrosine-containing and proline-rich motifs, respectively. The protein localizes to plasma membrane ruffles, and functions as a negative regulator of cell migration and adhesion triggered by the beta-2 integrin signal transduction pathway. Infection with Epstein-Barr virus results in the overexpression of this gene. Multiple alternatively spliced variants, encoding the same protein, have been identified. FGR ENSG00000000938 NA
2934 gelsolin The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. GSN ENSG00000148180 NA
10398 myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. MYL9 ENSG00000101335 NA
ENSG00000263335 NA NA AF001548.5 ENSG00000263335 NA
9770 Ras association domain family member 2 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. RASSF2 ENSG00000101265 NA
1513 cathepsin K The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. CTSK ENSG00000143387 NA
3912 laminin subunit beta 1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. LAMB1 ENSG00000091136 NA
5880 ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2) This gene encodes a member of the Ras superfamily of small guanosine triphosphate (GTP)-metabolizing proteins. The encoded protein localizes to the plasma membrane, where it regulates diverse processes, such as secretion, phagocytosis, and cell polarization. Activity of this protein is also involved in the generation of reactive oxygen species. Mutations in this gene are associated with neutrophil immunodeficiency syndrome. There is a pseudogene for this gene on chromosome 6. RAC2 ENSG00000128340 NA
3689 integrin subunit beta 2 This gene encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell-surface mediated signalling. The encoded protein plays an important role in immune response and defects in this gene cause leukocyte adhesion deficiency. Alternative splicing results in multiple transcript variants. ITGB2 ENSG00000160255 NA
4000 lamin A/C The nuclear lamina consists of a two-dimensional matrix of proteins located next to the inner nuclear membrane. The lamin family of proteins make up the matrix and are highly conserved in evolution. During mitosis, the lamina matrix is reversibly disassembled as the lamin proteins are phosphorylated. Lamin proteins are thought to be involved in nuclear stability, chromatin structure and gene expression. Vertebrate lamins consist of two types, A and B. Alternative splicing results in multiple transcript variants. Mutations in this gene lead to several diseases: Emery-Dreifuss muscular dystrophy, familial partial lipodystrophy, limb girdle muscular dystrophy, dilated cardiomyopathy, Charcot-Marie-Tooth disease, and Hutchinson-Gilford progeria syndrome. LMNA ENSG00000160789 NA
8829 neuropilin 1 This gene encodes one of two neuropilins, which contain specific protein domains which allow them to participate in several different types of signaling pathways that control cell migration. Neuropilins contain a large N-terminal extracellular domain, made up of complement-binding, coagulation factor V/VIII, and meprin domains. These proteins also contains a short membrane-spanning domain and a small cytoplasmic domain. Neuropilins bind many ligands and various types of co-receptors; they affect cell survival, migration, and attraction. Some of the ligands and co-receptors bound by neuropilins are vascular endothelial growth factor (VEGF) and semaphorin family members. Several alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. NRP1 ENSG00000099250 NA
NA NA NA NA ENSG00000259716 TRUE
3915 laminin subunit gamma 1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins, composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively), have a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the gamma chain isoform laminin, gamma 1. The gamma 1 chain, formerly thought to be a beta chain, contains structural domains similar to beta chains, however, lacks the short alpha region separating domains I and II. The structural organization of this gene also suggested that it had diverged considerably from the beta chain genes. Embryos of transgenic mice in which both alleles of the gamma 1 chain gene were inactivated by homologous recombination, lacked basement membranes, indicating that laminin, gamma 1 chain is necessary for laminin heterotrimer assembly. It has been inferred by analogy with the strikingly similar 3’ UTR sequence in mouse laminin gamma 1 cDNA, that multiple polyadenylation sites are utilized in human to generate the 2 different sized mRNAs (5.5 and 7.5 kb) seen on Northern analysis. LAMC1 ENSG00000135862 NA
3486 insulin like growth factor binding protein 3 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. IGFBP3 ENSG00000146674 NA
59 actin, alpha 2, smooth muscle, aorta The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 ENSG00000107796 NA
26136 testin LIM domain protein Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. TES ENSG00000135269 NA
5341 pleckstrin NA PLEK ENSG00000115956 NA
7462 linker for activation of T-cells family member 2 This gene is one of the contiguous genes at 7q11.23 commonly deleted in Williams syndrome, a multisystem developmental disorder. This gene consists of at least 14 exons, and its alternative splicing generates 3 transcript variants, all encoding the same protein. LAT2 ENSG00000086730 NA
7134 troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. TNNC1 ENSG00000114854 NA
4067 LYN proto-oncogene, Src family tyrosine kinase This gene encodes a tyrosine protein kinase, which maybe involved in the regulation of mast cell degranulation, and erythroid differentiation. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. LYN ENSG00000254087 NA
5997 regulator of G-protein signaling 2 Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. RGS2 ENSG00000116741 NA
101 ADAM metallopeptidase domain 8 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The protein encoded by this gene may be involved in cell adhesion during neurodegeneration, and it is thought to be a target for allergic respiratory diseases, including asthma. Alternative splicing results in multiple transcript variants. ADAM8 ENSG00000151651 NA
5159 platelet derived growth factor receptor beta This gene encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. These growth factors are mitogens for cells of mesenchymal origin. The identity of the growth factor bound to a receptor monomer determines whether the functional receptor is a homodimer or a heterodimer, composed of both platelet-derived growth factor receptor alpha and beta polypeptides. This gene is flanked on chromosome 5 by the genes for granulocyte-macrophage colony-stimulating factor and macrophage-colony stimulating factor receptor; all three genes may be implicated in the 5-q syndrome. A translocation between chromosomes 5 and 12, that fuses this gene to that of the translocation, ETV6, leukemia gene, results in chronic myeloproliferative disorder with eosinophilia. PDGFRB ENSG00000113721 NA
171024 synaptopodin 2 NA SYNPO2 ENSG00000172403 NA
9659 phosphodiesterase 4D interacting protein The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. PDE4DIP ENSG00000178104 NA
2274 four and a half LIM domains 2 This gene encodes a member of the four-and-a-half-LIM-only protein family. Family members contain two highly conserved, tandemly arranged, zinc finger domains with four highly conserved cysteines binding a zinc atom in each zinc finger. This protein is thought to have a role in the assembly of extracellular membranes. Also, this gene is down-regulated during transformation of normal myoblasts to rhabdomyosarcoma cells and the encoded protein may function as a link between presenilin-2 and an intracellular signaling pathway. Multiple alternatively spliced variants encoding different isoforms have been identified. FHL2 ENSG00000115641 NA
10487 CAP, adenylate cyclase-associated protein 1 (yeast) The protein encoded by this gene is related to the S. cerevisiae CAP protein, which is involved in the cyclic AMP pathway. The human protein is able to interact with other molecules of the same protein, as well as with CAP2 and actin. Alternatively spliced transcript variants have been identified. CAP1 ENSG00000131236 NA
3956 galectin 1 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. This gene product may act as an autocrine negative growth factor that regulates cell proliferation. LGALS1 ENSG00000100097 NA
57449 pleckstrin homology and RhoGEF domain containing G5 This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. PLEKHG5 ENSG00000171680 NA
51474 LIM domain and actin binding 1 This gene encodes a cytoskeleton-associated protein that inhibits actin filament depolymerization and cross-links filaments in bundles. It is downregulated in some cancer cell lines. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and expression of some of the variants maybe independently regulated. LIMA1 ENSG00000050405 NA
79930 docking protein 3 NA DOK3 ENSG00000146094 NA
23526 Rho GTPase activating protein 45 NA ARHGAP45 ENSG00000180448 NA
4023 lipoprotein lipase LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. LPL ENSG00000175445 NA
4878 natriuretic peptide A The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NPPA ENSG00000175206 NA
23216 TBC1 domain family member 1 TBC1D1 is the founding member of a family of proteins sharing a 180- to 200-amino acid TBC domain presumed to have a role in regulating cell growth and differentiation. These proteins share significant homology with TRE2 (USP6; MIM 604334), yeast Bub2, and CDC16 (MIM 603461) (White et al., 2000 [PubMed 10965142]). TBC1D1 ENSG00000065882 NA
23336 synemin The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. SYNM ENSG00000182253 NA
6281 S100 calcium binding protein A10 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. S100A10 ENSG00000197747 NA
23362 pleckstrin and Sec7 domain containing 3 NA PSD3 ENSG00000156011 NA
63940 G-protein signaling modulator 3 NA GPSM3 ENSG00000213654 NA
2202 EGF containing fibulin like extracellular matrix protein 1 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. EFEMP1 ENSG00000115380 NA
397 Rho GDP dissociation inhibitor beta Members of the Rho (or ARH) protein family (see MIM 165390) and other Ras-related small GTP-binding proteins (see MIM 179520) are involved in diverse cellular events, including cell signaling, proliferation, cytoskeletal organization, and secretion. The GTP-binding proteins are active only in the GTP-bound state. At least 3 classes of proteins tightly regulate cycling between the GTP-bound and GDP-bound states: GTPase-activating proteins (GAPs), guanine nucleotide-releasing factors (GRFs), and GDP-dissociation inhibitors (GDIs). The GDIs, including ARHGDIB, decrease the rate of GDP dissociation from Ras-like GTPases (summary by Scherle et al., 1993 [PubMed 8356058]). ARHGDIB ENSG00000111348 NA
57580 phosphatidylinositol-3,4,5-trisphosphate dependent Rac exchange factor 1 The protein encoded by this gene acts as a guanine nucleotide exchange factor for the RHO family of small GTP-binding proteins (RACs). It has been shown to bind to and activate RAC1 by exchanging bound GDP for free GTP. The encoded protein, which is found mainly in the cytoplasm, is activated by phosphatidylinositol-3,4,5-trisphosphate and the beta-gamma subunits of heterotrimeric G proteins. PREX1 ENSG00000124126 NA
56944 olfactomedin like 3 NA OLFML3 ENSG00000116774 NA
51299 neuritin 1 This gene encodes a member of the neuritin family, and is expressed in postmitotic-differentiating neurons of the developmental nervous system and neuronal structures associated with plasticity in the adult. The expression of this gene can be induced by neural activity and neurotrophins. The encoded protein contains a consensus cleavage signal found in glycosylphoshatidylinositol (GPI)-anchored proteins. The encoded protein promotes neurite outgrowth and arborization, suggesting its role in promoting neuritogenesis. Overexpression of the encoded protein may be associated with astrocytoma progression. Alternative splicing results in multiple transcript variants. NRN1 ENSG00000124785 NA
9473 thymocyte selection associated family member 2 NA THEMIS2 ENSG00000130775 NA
4638 myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. MYLK ENSG00000065534 NA
ENSG00000263065 NA NA AF001548.6 ENSG00000263065 NA
9744 ArfGAP with coiled-coil, ankyrin repeat and PH domains 1 NA ACAP1 ENSG00000072818 NA
7057 thrombospondin 1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. THBS1 ENSG00000137801 NA
257106 Rho GTPase activating protein 30 NA ARHGAP30 ENSG00000186517 NA
ENSG00000180139 ACTA2 antisense RNA 1 NA ACTA2-AS1 ENSG00000180139 NA
7148 tenascin XB This gene encodes a member of the tenascin family of extracellular matrix glycoproteins. The tenascins have anti-adhesive effects, as opposed to fibronectin which is adhesive. This protein is thought to function in matrix maturation during wound healing, and its deficiency has been associated with the connective tissue disorder Ehlers-Danlos syndrome. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. It is one of four genes in this cluster which have been duplicated. The duplicated copy of this gene is incomplete and is a pseudogene which is transcribed but does not encode a protein. The structure of this gene is unusual in that it overlaps the CREBL1 and CYP21A2 genes at its 5’ and 3’ ends, respectively. Multiple transcript variants encoding different isoforms have been found for this gene. TNXB ENSG00000168477 NA
5176 serpin family F member 1 The protein encoded by this gene is a member of the serpin family, although it does not display the serine protease inhibitory activity shown by many of the other serpin family members. The encoded protein is secreted and strongly inhibits angiogenesis. In addition, this protein is a neurotrophic factor involved in neuronal differentiation in retinoblastoma cells. SERPINF1 ENSG00000132386 NA
3725 Jun proto-oncogene, AP-1 transcription factor subunit This gene is the putative transforming gene of avian sarcoma virus 17. It encodes a protein which is highly similar to the viral protein, and which interacts directly with specific target DNA sequences to regulate gene expression. This gene is intronless and is mapped to 1p32-p31, a chromosomal region involved in both translocations and deletions in human malignancies. JUN ENSG00000177606 NA
126308 MOB kinase activator 3A NA MOB3A ENSG00000172081 NA
10636 regulator of G-protein signaling 14 This gene encodes a member of the regulator of G-protein signaling family. This protein contains one RGS domain, two Raf-like Ras-binding domains (RBDs), and one GoLoco domain. The protein attenuates the signaling activity of G-proteins by binding, through its GoLoco domain, to specific types of activated, GTP-bound G alpha subunits. Acting as a GTPase activating protein (GAP), the protein increases the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. Alternate transcriptional splice variants of this gene have been observed but have not been thoroughly characterized. RGS14 ENSG00000169220 NA
51232 cysteine rich transmembrane BMP regulator 1 (chordin-like) This gene encodes a transmembrane protein containing six cysteine-rich repeat domains and an insulin-like growth factor-binding domain. The encoded protein may play a role in tissue development though interactions with members of the transforming growth factor beta family, such as bone morphogenetic proteins. CRIM1 ENSG00000150938 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
symbol X_id summary query name
FN1 2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1
KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. ENSG00000186395 keratin 10
COL18A1 80781 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. ENSG00000182871 collagen type XVIII alpha 1 chain
KRT1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1
APOA1 335 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. ENSG00000118137 apolipoprotein A1
NOV 4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. ENSG00000136999 nephroblastoma overexpressed
ITIH3 3699 This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. ENSG00000162267 inter-alpha-trypsin inhibitor heavy chain 3
MBP 4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. ENSG00000197971 myelin basic protein
MGP 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. ENSG00000111341 matrix Gla protein
AGT 183 The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. ENSG00000135744 angiotensinogen
ALB 213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ENSG00000163631 albumin
CP 1356 The protein encoded by this gene is a metalloprotein that binds most of the copper in plasma and is involved in the peroxidation of Fe(II)transferrin to Fe(III) transferrin. Mutations in this gene cause aceruloplasminemia, which results in iron accumulation and tissue damage, and is associated with diabetes and neurologic abnormalities. Two transcript variants, one protein-coding and the other not protein-coding, have been found for this gene. ENSG00000047457 ceruloplasmin (ferroxidase)
ARHGEF10L 55160 ARHGEF10L is a member of the RhoGEF family of guanine nucleotide exchange factors (GEFs) that activate Rho GTPases (Winkler et al., 2005 [PubMed 16112081]). ENSG00000074964 Rho guanine nucleotide exchange factor 10 like
TAGLN 6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. ENSG00000149591 transgelin
THBS1 7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1
ACTA2 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000107796 actin, alpha 2, smooth muscle, aorta
PRL 5617 This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. ENSG00000172179 prolactin
VIM 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. ENSG00000026025 vimentin
MYH11 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000133392 myosin, heavy chain 11, smooth muscle
CLU 1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. ENSG00000120885 clusterin
KRT2 3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000172867 keratin 2
ACTC1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ENSG00000159251 actin, alpha, cardiac muscle 1
FGA 2243 This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. ENSG00000171560 fibrinogen alpha chain
APOC3 345 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. ENSG00000110245 apolipoprotein C3
CPS1 1373 The mitochondrial enzyme encoded by this gene catalyzes synthesis of carbamoyl phosphate from ammonia and bicarbonate. This reaction is the first committed step of the urea cycle, which is important in the removal of excess urea from cells. The encoded protein may also represent a core mitochondrial nucleoid protein. Three transcript variants encoding different isoforms have been found for this gene. The shortest isoform may not be localized to the mitochondrion. Mutations in this gene have been associated with carbamoyl phosphate synthetase deficiency, susceptibility to persistent pulmonary hypertension, and susceptibility to venoocclusive disease after bone marrow transplantation. ENSG00000021826 carbamoyl-phosphate synthase 1
PTGDS 5730 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. ENSG00000107317 prostaglandin D2 synthase
FGB 2244 The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000171564 fibrinogen beta chain
PPP1R3C 5507 This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. ENSG00000119938 protein phosphatase 1 regulatory subunit 3C
HPD 3242 The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. ENSG00000158104 4-hydroxyphenylpyruvate dioxygenase
IGFBP2 3485 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. ENSG00000115457 insulin like growth factor binding protein 2
SDC1 6382 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. ENSG00000115884 syndecan 1
DMKN 93099 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. ENSG00000161249 dermokine
TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. ENSG00000118194 troponin T2, cardiac type
CHD3 1107 This gene encodes a member of the CHD family of proteins which are characterized by the presence of chromo (chromatin organization modifier) domains and SNF2-related helicase/ATPase domains. This protein is one of the components of a histone deacetylase complex referred to as the Mi-2/NuRD complex which participates in the remodeling of chromatin by deacetylating histones. Chromatin remodeling is essential for many processes including transcription. Autoantibodies against this protein are found in a subset of patients with dermatomyositis. Three alternatively spliced transcripts encoding different isoforms have been described. ENSG00000170004 chromodomain helicase DNA binding protein 3
SORBS2 8470 Arg and c-Abl represent the mammalian members of the Abelson family of non-receptor protein-tyrosine kinases. They interact with the Arg/Abl binding proteins via the SH3 domains present in the carboxy end of the latter group of proteins. This gene encodes the sorbin and SH3 domain containing 2 protein. It has three C-terminal SH3 domains and an N-terminal sorbin homology (SoHo) domain that interacts with lipid raft proteins. The subcellular localization of this protein in epithelial and cardiac muscle cells suggests that it functions as an adapter protein to assemble signaling complexes in stress fibers, and that it is a potential link between Abl family kinases and the actin cytoskeleton. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000154556 sorbin and SH3 domain containing 2
DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin
CELF2 10659 Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000048740 CUGBP, Elav-like family member 2
DES 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. ENSG00000175084 desmin
ORM1 5004 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. ENSG00000229314 orosomucoid 1
ACAT1 38 This gene encodes a mitochondrially localized enzyme that catalyzes the reversible formation of acetoacetyl-CoA from two molecules of acetyl-CoA. Defects in this gene are associated with 3-ketothiolase deficiency, an inborn error of isoleucine catabolism characterized by urinary excretion of 2-methyl-3-hydroxybutyric acid, 2-methylacetoacetic acid, tiglylglycine, and butanone. ENSG00000075239 acetyl-CoA acetyltransferase 1
HP 3240 This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000257017 haptoglobin
TOB1 10140 This gene encodes a member of the transducer of erbB-2 /B-cell translocation gene protein family. Members of this family are anti-proliferative factors that have the potential to regulate cell growth. The encoded protein may function as a tumor suppressor. Alternate splicing results in multiple transcript variants. ENSG00000141232 transducer of ERBB2, 1
CFH 3075 This gene is a member of the Regulator of Complement Activation (RCA) gene cluster and encodes a protein with twenty short consensus repeat (SCR) domains. This protein is secreted into the bloodstream and has an essential role in the regulation of complement activation, restricting this innate defense mechanism to microbial infections. Mutations in this gene have been associated with hemolytic-uremic syndrome (HUS) and chronic hypocomplementemic nephropathy. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. ENSG00000000971 complement factor H
NCDN 23154 This gene encodes a leucine-rich cytoplasmic protein, which is highly similar to a mouse protein that negatively regulates Ca/calmodulin-dependent protein kinase II phosphorylation and may be essential for spatial learning processes. Several alternatively spliced transcript variants of this gene have been described. ENSG00000020129 neurochondrin
COL1A1 1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1
FTH1 2495 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. ENSG00000167996 ferritin heavy chain 1
PRSS1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1
GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. ENSG00000135821 glutamate-ammonia ligase
IGFBP7 3490 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). ENSG00000163453 insulin like growth factor binding protein 7
DSP 1832 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. ENSG00000096696 desmoplakin
KANK1 23189 The protein encoded by this gene belongs to the Kank family of proteins, which contain multiple ankyrin repeat domains. This family member functions in cytoskeleton formation by regulating actin polymerization. This gene is a candidate tumor suppressor for renal cell carcinoma. Mutations in this gene cause cerebral palsy spastic quadriplegic type 2, a central nervous system development disorder. A t(5;9) translocation results in fusion of the platelet-derived growth factor receptor beta gene (PDGFRB) on chromosome 5 with this gene in a myeloproliferative neoplasm featuring severe thrombocythemia. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 20. ENSG00000107104 KN motif and ankyrin repeat domains 1
FGG 2266 The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. ENSG00000171557 fibrinogen gamma chain
KIF1A 547 The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. ENSG00000130294 kinesin family member 1A
FASN 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. ENSG00000169710 fatty acid synthase
PFKFB3 5209 The protein encoded by this gene belongs to a family of bifunctional proteins that are involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate (F2,6BP), and a fructose-2,6-biphosphatase activity that catalyzes the degradation of F2,6BP. This protein is required for cell cycle progression and prevention of apoptosis. It functions as a regulator of cyclin-dependent kinase 1, linking glucose metabolism to cell proliferation and survival in tumor cells. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000170525 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3
NEBL 10529 This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. ENSG00000078114 nebulette
ALDOB 229 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. ENSG00000136872 aldolase, fructose-bisphosphate B
POMC 5443 This gene encodes a preproprotein that undergoes extensive, tissue-specific, post-translational processing via cleavage by subtilisin-like enzymes known as prohormone convertases. There are eight potential cleavage sites within the preproprotein and, depending on tissue type and the available convertases, processing may yield as many as ten biologically active peptides involved in diverse cellular functions. The encoded protein is synthesized mainly in corticotroph cells of the anterior pituitary where four cleavage sites are used; adrenocorticotrophin, essential for normal steroidogenesis and the maintenance of normal adrenal weight, and lipotropin beta are the major end products. In other tissues, including the hypothalamus, placenta, and epithelium, all cleavage sites may be used, giving rise to peptides with roles in pain and energy homeostasis, melanocyte stimulation, and immune modulation. These include several distinct melanotropins, lipotropins, and endorphins that are contained within the adrenocorticotrophin and beta-lipotropin peptides. The antimicrobial melanotropin alpha peptide exhibits antibacterial and antifungal activity. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation. Alternatively spliced transcript variants encoding the same protein have been described. ENSG00000115138 proopiomelanocortin
PLN 5350 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. ENSG00000198523 phospholamban
CLDN1 9076 Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. ENSG00000163347 claudin 1
A1BG 1 The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. ENSG00000121410 alpha-1-B glycoprotein
GH1 2688 The protein encoded by this gene is a member of the somatotropin/prolactin family of hormones which play an important role in growth control. The gene, along with four other related genes, is located at the growth hormone locus on chromosome 17 where they are interspersed in the same transcriptional orientation; an arrangement which is thought to have evolved by a series of gene duplications. The five genes share a remarkably high degree of sequence identity. Alternative splicing generates additional isoforms of each of the five growth hormones, leading to further diversity and potential for specialization. This particular family member is expressed in the pituitary but not in placental tissue as is the case for the other four genes in the growth hormone locus. Mutations in or deletions of the gene lead to growth hormone deficiency and short stature. ENSG00000259384 growth hormone 1
PTPN3 5774 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This protein contains a C-terminal PTP domain and an N-terminal domain homologous to the band 4.1 superfamily of cytoskeletal-associated proteins. P97, a cell cycle regulator involved in a variety of membrane related functions, has been shown to be a substrate of this PTP. This PTP was also found to interact with, and be regulated by adaptor protein 14-3-3 beta. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000070159 protein tyrosine phosphatase, non-receptor type 3
PALM 5064 This gene encodes a member of the paralemmin protein family. The product of this gene is a prenylated and palmitoylated phosphoprotein that associates with the cytoplasmic face of plasma membranes and is implicated in plasma membrane dynamics in neurons and other cell types. Several alternatively spliced transcript variants have been identified, but the full-length nature of only two transcript variants has been determined. ENSG00000099864 paralemmin
ITM2C 81618 NA ENSG00000135916 integral membrane protein 2C
PSD 5662 This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. ENSG00000059915 pleckstrin and Sec7 domain containing
CPA1 1357 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. ENSG00000091704 carboxypeptidase A1
SERPINA1 5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. ENSG00000197249 serpin family A member 1
IGFBP5 3488 NA ENSG00000115461 insulin like growth factor binding protein 5
HPX 3263 This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. ENSG00000110169 hemopexin
RGS5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. ENSG00000143248 regulator of G-protein signaling 5
PNLIP 5406 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. ENSG00000175535 pancreatic lipase
CA11 770 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system. ENSG00000063180 carbonic anhydrase 11
DLG4 1742 This gene encodes a member of the membrane-associated guanylate kinase (MAGUK) family. It heteromultimerizes with another MAGUK protein, DLG2, and is recruited into NMDA receptor and potassium channel clusters. These two MAGUK proteins may interact at postsynaptic sites to form a multimeric scaffold for the clustering of receptors, ion channels, and associated signaling proteins. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000132535 discs large MAGUK scaffold protein 4
PEBP1 5037 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. ENSG00000089220 phosphatidylethanolamine binding protein 1
NPNT 255743 NA ENSG00000168743 nephronectin
RP11-862L9.3 ENSG00000266844 NA ENSG00000266844 NA
GP2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. ENSG00000169347 glycoprotein 2
KRT14 3861 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. ENSG00000186847 keratin 14
CTD-2619J13.8 ENSG00000268230 NA ENSG00000268230 NA
CDH2 1000 This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. ENSG00000170558 cadherin 2
MYH6 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. ENSG00000197616 myosin, heavy chain 6, cardiac muscle, alpha
VTN 7448 The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. ENSG00000109072 vitronectin
PPP1R1B 84152 This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000131771 protein phosphatase 1 regulatory inhibitor subunit 1B
AP2B1 163 The protein encoded by this gene is one of two large chain components of the assembly protein complex 2, which serves to link clathrin to receptors in coated vesicles. The encoded protein is found on the cytoplasmic face of coated vesicles in the plasma membrane. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000006125 adaptor related protein complex 2 beta 1 subunit
ARG1 383 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exist (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type I isoform encoded by this gene, is a cytosolic enzyme and expressed predominantly in the liver as a component of the urea cycle. Inherited deficiency of this enzyme results in argininemia, an autosomal recessive disorder characterized by hyperammonemia. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000118520 arginase 1
SCD 6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. ENSG00000099194 stearoyl-CoA desaturase
TKT 7086 This gene encodes a thiamine-dependent enzyme which plays a role in the channeling of excess sugar phosphates to glycolysis in the pentose phosphate pathway. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000163931 transketolase
COL6A2 1292 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2
TPM2 7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000198467 tropomyosin 2 (beta)
MT1G 4495 NA ENSG00000125144 metallothionein 1G
CADM3-AS1 ENSG00000225670 NA ENSG00000225670 CADM3 antisense RNA 1
CYP3A5 1577 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. ENSG00000106258 cytochrome P450 family 3 subfamily A member 5
MAP1A 4130 This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. ENSG00000166963 microtubule associated protein 1A
FXYD6 53826 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. ENSG00000137726 FXYD domain containing ion transport regulator 6
CELA3A 10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. ENSG00000142789 chymotrypsin like elastase family member 3A
PMP22 5376 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. ENSG00000109099 peripheral myelin protein 22
KRT13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. ENSG00000171401 keratin 13
DNM1 1759 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. ENSG00000106976 dynamin 1
ACTG2 72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ENSG00000163017 actin, gamma 2, smooth muscle, enteric
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000197971 MBP 4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. myelin basic protein NA
ENSG00000092054 MYH7 4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta NA
ENSG00000266844 RP11-862L9.3 ENSG00000266844 NA NA NA
ENSG00000087086 FTL 2512 This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ferritin, light polypeptide NA
ENSG00000175084 DES 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin NA
ENSG00000133392 MYH11 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle NA
ENSG00000131095 GFAP 2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein NA
ENSG00000115705 TPO 7173 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. thyroid peroxidase NA
ENSG00000042832 TG 7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin NA
ENSG00000160882 CYP11B1 1584 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. cytochrome P450 family 11 subfamily B member 1 NA
ENSG00000197249 SERPINA1 5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. serpin family A member 1 NA
ENSG00000115461 IGFBP5 3488 NA insulin like growth factor binding protein 5 NA
ENSG00000186847 KRT14 3861 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 NA
ENSG00000198121 LPAR1 1902 The integral membrane protein encoded by this gene is a lysophosphatidic acid (LPA) receptor from a group known as EDG receptors. These receptors are members of the G protein-coupled receptor superfamily. Utilized by LPA for cell signaling, EDG receptors mediate diverse biologic functions, including proliferation, platelet aggregation, smooth muscle contraction, inhibition of neuroblastoma cell differentiation, chemotaxis, and tumor cell invasion. Two transcript variants encoding the same protein have been identified for this gene lysophosphatidic acid receptor 1 NA
ENSG00000148795 CYP17A1 1586 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. cytochrome P450 family 17 subfamily A member 1 NA
ENSG00000107796 ACTA2 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta NA
ENSG00000163631 ALB 213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. albumin NA
ENSG00000151552 QDPR 5860 This gene encodes the enzyme dihydropteridine reductase, which catalyzes the NADH-mediated reduction of quinonoid dihydrobiopterin. This enzyme is an essential component of the pterin-dependent aromatic amino acid hydroxylating systems. Mutations in this gene resulting in QDPR deficiency include aberrant splicing, amino acid substitutions, insertions, or premature terminations. Dihydropteridine reductase deficiency presents as atypical phenylketonuria due to insufficient production of biopterin, a cofactor for phenylalanine hydroxylase. quinoid dihydropteridine reductase NA
ENSG00000173432 SAA1 6288 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. serum amyloid A1 NA
ENSG00000118194 TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. troponin T2, cardiac type NA
ENSG00000189058 APOD 347 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. apolipoprotein D NA
ENSG00000138207 RBP4 5950 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. retinol binding protein 4 NA
ENSG00000244734 HBB 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta NA
ENSG00000163017 ACTG2 72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric NA
ENSG00000107331 ABCA2 20 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. ATP binding cassette subfamily A member 2 NA
ENSG00000134339 SAA2 6289 NA serum amyloid A2 NA
ENSG00000198959 TGM2 7052 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. transglutaminase 2 NA
ENSG00000047849 MAP4 4134 The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. microtubule associated protein 4 NA
ENSG00000106366 SERPINE1 5054 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. serpin family E member 1 NA
ENSG00000170421 KRT8 3856 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. keratin 8 NA
ENSG00000197616 MYH6 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha NA
ENSG00000269936 RP11-394O4.5 ENSG00000269936 NA NA NA
ENSG00000133710 SPINK5 11005 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. serine peptidase inhibitor, Kazal type 5 NA
ENSG00000158887 MPZ 4359 This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. myelin protein zero NA
ENSG00000135821 GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase NA
ENSG00000255071 SAA2-SAA4 100528017 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. SAA2-SAA4 readthrough NA
ENSG00000165795 NDRG2 57447 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG family member 2 NA
ENSG00000136872 ALDOB 229 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. aldolase, fructose-bisphosphate B NA
ENSG00000106631 MYL7 58498 NA myosin light chain 7 NA
ENSG00000140545 MFGE8 4240 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. milk fat globule-EGF factor 8 protein NA
ENSG00000257017 HP 3240 This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. haptoglobin NA
ENSG00000111245 MYL2 4633 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 2 NA
ENSG00000188536 HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 2 NA
ENSG00000149557 FEZ1 9638 This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Expression of this gene in C. elegans unc-76 mutants can restore to the mutants partial locomotion and axonal fasciculation, suggesting that it also functions in axonal outgrowth. The N-terminal half of the gene product is highly acidic. Alternatively spliced transcript variants encoding different isoforms of this protein have been described. fasciculation and elongation protein zeta 1 NA
ENSG00000187239 FNBP1 23048 The protein encoded by this gene is a member of the formin-binding-protein family. The protein contains an N-terminal Fer/Cdc42-interacting protein 4 (CIP4) homology (FCH) domain followed by a coiled-coil domain, a proline-rich motif, a second coiled-coil domain, a Rho family protein-binding domain (RBD), and a C-terminal SH3 domain. This protein binds sorting nexin 2 (SNX2), tankyrase (TNKS), and dynamin; an interaction between this protein and formin has not been demonstrated yet in human. formin binding protein 1 NA
ENSG00000164692 COL1A2 1278 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 2 chain NA
ENSG00000237973 MTCO1P12 ENSG00000237973 NA MT-CO1 pseudogene 12 NA
ENSG00000206172 HBA1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 1 NA
ENSG00000140416 TPM1 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) NA
ENSG00000165424 ZCCHC24 219654 NA zinc finger CCHC-type containing 24 NA
ENSG00000169554 ZEB2 9839 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. zinc finger E-box binding homeobox 2 NA
ENSG00000119280 C1orf198 84886 NA chromosome 1 open reading frame 198 NA
ENSG00000073060 SCARB1 949 The protein encoded by this gene is a plasma membrane receptor for high density lipoprotein cholesterol (HDL). The encoded protein mediates cholesterol transfer to and from HDL. In addition, this protein is a receptor for hepatitis C virus glycoprotein E2. Two transcript variants encoding different isoforms have been found for this gene. scavenger receptor class B member 1 NA
ENSG00000160796 NBEAL2 23218 The protein encoded by this gene contains a beige and Chediak-Higashi (BEACH) domain and multiple WD40 domains, and may play a role in megakaryocyte alpha-granule biogenesis. Mutations in this gene are a cause of gray platelet syndrome. neurobeachin like 2 NA
ENSG00000117289 NA NA NA NA TRUE
ENSG00000172915 NBEA 26960 This gene encodes a member of a large, diverse group of A-kinase anchor proteins that target the activity of protein kinase A to specific subcellular sites by binding to its type II regulatory subunits. Brain-specific expression and coat protein-like membrane recruitment of a highly similar protein in mouse suggest an involvement in neuronal post-Golgi membrane traffic. Mutations in this gene may be associated with a form of autism. This gene and its expression are frequently disrupted in patients with multiple myeloma. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional transcript variants may exist, but their full-length nature has not been determined. neurobeachin NA
ENSG00000109846 CRYAB 1410 Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. crystallin alpha B NA
ENSG00000162734 PEA15 8682 This gene encodes a death effector domain-containing protein that functions as a negative regulator of apoptosis. The encoded protein is an endogenous substrate for protein kinase C. This protein is also overexpressed in type 2 diabetes mellitus, where it may contribute to insulin resistance in glucose uptake. Alternative splicing results in multiple transcript variants. phosphoprotein enriched in astrocytes 15 NA
ENSG00000118816 CCNI 10983 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin shows the highest similarity with cyclin G. The transcript of this gene was found to be expressed constantly during cell cycle progression. The function of this cyclin has not yet been determined. cyclin I NA
ENSG00000166925 TSC22D4 81628 TSC22D4 is a member of the TSC22 domain family of leucine zipper transcriptional regulators (see TSC22D3; MIM 300506) (Kester et al., 1999 [PubMed 10488076]; Fiorenza et al., 2001 [PubMed 11707329]). TSC22 domain family member 4 NA
ENSG00000064393 HIPK2 28996 This gene encodes a conserved serine/threonine kinase that is a member of the homeodomain-interacting protein kinase family. The encoded protein interacts with homeodomain transcription factors and many other transcription factors such as p53, and can function as both a corepressor and a coactivator depending on the transcription factor and its subcellular localization. Multiple transcript variants encoding different isoforms have been found for this gene. homeodomain interacting protein kinase 2 NA
ENSG00000172270 BSG 682 The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. basigin (Ok blood group) NA
ENSG00000011465 DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin NA
ENSG00000229314 ORM1 5004 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. orosomucoid 1 NA
ENSG00000186395 KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 NA
ENSG00000171236 LRG1 116844 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). leucine rich alpha-2-glycoprotein 1 NA
ENSG00000106624 AEBP1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1 NA
ENSG00000118729 CASQ2 845 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. calsequestrin 2 NA
ENSG00000100994 PYGB 5834 The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. phosphorylase, glycogen; brain NA
ENSG00000036448 MYOM2 9172 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. myomesin 2 NA
ENSG00000173991 TCAP 8557 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. titin-cap NA
ENSG00000111640 GAPDH 2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase NA
ENSG00000109061 MYH1 4619 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. myosin, heavy chain 1, skeletal muscle, adult NA
ENSG00000131471 AOC3 8639 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. amine oxidase, copper containing 3 NA
ENSG00000121653 MAPK8IP1 9479 This gene encodes a regulator of the pancreatic beta-cell function. It is highly similar to JIP-1, a mouse protein known to be a regulator of c-Jun amino-terminal kinase (Mapk8). This protein has been shown to prevent MAPK8 mediated activation of transcription factors, and to decrease IL-1 beta and MAP kinase kinase 1 (MEKK1) induced apoptosis in pancreatic beta cells. This protein also functions as a DNA-binding transactivator of the glucose transporter GLUT2. RE1-silencing transcription factor (REST) is reported to repress the expression of this gene in insulin-secreting beta cells. This gene is found to be mutated in a type 2 diabetes family, and thus is thought to be a susceptibility gene for type 2 diabetes. mitogen-activated protein kinase 8 interacting protein 1 NA
ENSG00000092820 EZR 7430 The cytoplasmic peripheral membrane protein encoded by this gene functions as a protein-tyrosine kinase substrate in microvilli. As a member of the ERM protein family, this protein serves as an intermediate between the plasma membrane and the actin cytoskeleton. This protein plays a key role in cell surface structure adhesion, migration and organization, and it has been implicated in various human cancers. A pseudogene located on chromosome 3 has been identified for this gene. Alternatively spliced variants have also been described for this gene. ezrin NA
ENSG00000117525 F3 2152 This gene encodes coagulation factor III which is a cell surface glycoprotein. This factor enables cells to initiate the blood coagulation cascades, and it functions as the high-affinity receptor for the coagulation factor VII. The resulting complex provides a catalytic event that is responsible for initiation of the coagulation protease cascades by specific limited proteolysis. Unlike the other cofactors of these protease cascades, which circulate as nonfunctional precursors, this factor is a potent initiator that is fully functional when expressed on cell surfaces. There are 3 distinct domains of this factor: extracellular, transmembrane, and cytoplasmic. This protein is the only one in the coagulation pathway for which a congenital deficiency has not been described. Alternate splicing results in multiple transcript variants. coagulation factor III, tissue factor NA
ENSG00000129116 PALLD 23022 This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. palladin, cytoskeletal associated protein NA
ENSG00000198125 MB 4151 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. myoglobin NA
ENSG00000132470 ITGB4 3691 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. integrin subunit beta 4 NA
ENSG00000115386 REG1A 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 alpha NA
ENSG00000171992 SYNPO 11346 Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). synaptopodin NA
ENSG00000173641 HSPB7 27129 NA heat shock protein family B (small) member 7 NA
ENSG00000114993 RTKN 6242 This gene encodes a scaffold protein that interacts with GTP-bound Rho proteins. Binding of this protein inhibits the GTPase activity of Rho proteins. This protein may interfere with the conversion of active, GTP-bound Rho to the inactive GDP-bound form by RhoGAP. Rho proteins regulate many important cellular processes, including cytokinesis, transcription, smooth muscle contraction, cell growth and transformation. Dysregulation of the Rho signal transduction pathway has been implicated in many forms of cancer. Alternative splicing results in multiple transcript variants encoding different isoforms. rhotekin NA
ENSG00000160808 MYL3 4634 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 3 NA
ENSG00000160781 PAQR6 79957 NA progestin and adipoQ receptor family member 6 NA
ENSG00000171560 FGA 2243 This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. fibrinogen alpha chain NA
ENSG00000130528 HRC 3270 This gene encodes a luminal sarcoplasmic reticulum protein identified by its ability to bind low-density lipoprotein with high affinity. The protein interacts with the cytoplasmic domain of triadin, the main transmembrane protein of the junctional sarcoplasmic reticulum (SR) of skeletal muscle. The protein functions in the regulation of releasable calcium into the SR. histidine rich calcium binding protein NA
ENSG00000197756 RPL37A 6168 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L37AE family of ribosomal proteins. It is located in the cytoplasm. The protein contains a C4-type zinc finger-like domain. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein L37a NA
ENSG00000164309 CMYA5 202333 NA cardiomyopathy associated 5 NA
ENSG00000197893 NRAP 4892 NA nebulin related anchoring protein NA
ENSG00000148677 ANKRD1 27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ankyrin repeat domain 1 NA
ENSG00000175899 A2M 2 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. alpha-2-macroglobulin NA
ENSG00000100364 KIAA0930 23313 NA KIAA0930 NA
ENSG00000115457 IGFBP2 3485 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. insulin like growth factor binding protein 2 NA
ENSG00000135218 CD36 948 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. CD36 molecule NA
ENSG00000140181 NA NA NA NA TRUE
ENSG00000173786 CNP 1267 NA 2’,3’-cyclic nucleotide 3’ phosphodiesterase NA
ENSG00000256545 NA NA NA NA TRUE
ENSG00000172023 REG1B 5968 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 beta NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name
ENSG00000173432 SAA1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. 6288 serum amyloid A1
ENSG00000133392 MYH11 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 myosin, heavy chain 11, smooth muscle
ENSG00000184009 ACTG1 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. 71 actin gamma 1
ENSG00000186395 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 keratin 10
ENSG00000170323 FABP4 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. 2167 fatty acid binding protein 4
ENSG00000135218 CD36 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. 948 CD36 molecule
ENSG00000067225 PKM This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. 5315 pyruvate kinase, muscle
ENSG00000151726 ACSL1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. 2180 acyl-CoA synthetase long-chain family member 1
ENSG00000111640 GAPDH This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. 2597 glyceraldehyde-3-phosphate dehydrogenase
ENSG00000138207 RBP4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. 5950 retinol binding protein 4
ENSG00000137801 THBS1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. 7057 thrombospondin 1
ENSG00000130402 ACTN4 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. 81 actinin alpha 4
ENSG00000147872 PLIN2 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. 123 perilipin 2
ENSG00000196616 ADH1B The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. 125 alcohol dehydrogenase 1B (class I), beta polypeptide
ENSG00000068976 PYGM This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 phosphorylase, glycogen, muscle
ENSG00000149925 ALDOA The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. 226 aldolase, fructose-bisphosphate A
ENSG00000115386 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 regenerating family member 1 alpha
ENSG00000075624 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 actin, beta
ENSG00000100345 MYH9 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. 4627 myosin, heavy chain 9, non-muscle
ENSG00000169347 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 glycoprotein 2
ENSG00000107796 ACTA2 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 actin, alpha 2, smooth muscle, aorta
ENSG00000178719 GRINA NA 2907 glutamate ionotropic receptor NMDA type subunit associated protein 1
ENSG00000175445 LPL LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. 4023 lipoprotein lipase
ENSG00000163220 S100A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 S100 calcium binding protein A9
ENSG00000121310 ECHDC2 NA 55268 enoyl-CoA hydratase domain containing 2
ENSG00000188536 HBA2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040 hemoglobin subunit alpha 2
ENSG00000125414 MYH2 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4620 myosin, heavy chain 2, skeletal muscle, adult
ENSG00000182326 C1S This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. 716 complement component 1, s subcomponent
ENSG00000196091 MYBPC1 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 myosin binding protein C, slow type
ENSG00000091704 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 carboxypeptidase A1
ENSG00000072778 ACADVL The protein encoded by this gene is targeted to the inner mitochondrial membrane where it catalyzes the first step of the mitochondrial fatty acid beta-oxidation pathway. This acyl-Coenzyme A dehydrogenase is specific to long-chain and very-long-chain fatty acids. A deficiency in this gene product reduces myocardial fatty acid beta-oxidation and is associated with cardiomyopathy. Alternative splicing results in multiple transcript variants encoding different isoforms. 37 acyl-CoA dehydrogenase, very long chain
ENSG00000135821 GLUL The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. 2752 glutamate-ammonia ligase
ENSG00000204983 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 protease, serine 1
ENSG00000221978 CCNL2 The protein encoded by this gene belongs to the cyclin family. Through its interaction with several proteins, such as RNA polymerase II, splicing factors, and cyclin-dependent kinases, this protein functions as a regulator of the pre-mRNA splicing process, as well as in inducing apoptosis by modulating the expression of apoptotic and antiapoptotic proteins. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 81669 cyclin L2
ENSG00000010327 STAB1 This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. 23166 stabilin 1
ENSG00000132386 SERPINF1 The protein encoded by this gene is a member of the serpin family, although it does not display the serine protease inhibitory activity shown by many of the other serpin family members. The encoded protein is secreted and strongly inhibits angiogenesis. In addition, this protein is a neurotrophic factor involved in neuronal differentiation in retinoblastoma cells. 5176 serpin family F member 1
ENSG00000109061 MYH1 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. 4619 myosin, heavy chain 1, skeletal muscle, adult
ENSG00000065534 MYLK This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. 4638 myosin light chain kinase
ENSG00000170835 CEL The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. 1056 carboxyl ester lipase
ENSG00000134339 SAA2 NA 6289 serum amyloid A2
ENSG00000188257 PLA2G2A The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. 5320 phospholipase A2 group IIA
ENSG00000101474 APMAP NA 57136 adipocyte plasma membrane associated protein
ENSG00000198523 PLN The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. 5350 phospholamban
ENSG00000153002 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 carboxypeptidase B1
ENSG00000142156 COL6A1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. 1291 collagen type VI alpha 1
ENSG00000184232 OAF NA 220323 out at first homolog
ENSG00000155657 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 titin
ENSG00000167768 KRT1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 keratin 1
ENSG00000072110 ACTN1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. 87 actinin alpha 1
ENSG00000062282 DGAT2 This gene encodes one of two enzymes which catalyzes the final reaction in the synthesis of triglycerides in which diacylglycerol is covalently bound to long chain fatty acyl-CoAs. The encoded protein catalyzes this reaction at low concentrations of magnesium chloride while the other enzyme has high activity at high concentrations of magnesium chloride. Multiple transcript variants encoding different isoforms have been found for this gene. 84649 diacylglycerol O-acyltransferase 2
ENSG00000140575 IQGAP1 This gene encodes a member of the IQGAP family. The protein contains four IQ domains, one calponin homology domain, one Ras-GAP domain and one WW domain. It interacts with components of the cytoskeleton, with cell adhesion molecules, and with several signaling molecules to regulate cell morphology and motility. Expression of the protein is upregulated by gene amplification in two gastric cancer cell lines. 8826 IQ motif containing GTPase activating protein 1
ENSG00000081189 MEF2C This locus encodes a member of the MADS box transcription enhancer factor 2 (MEF2) family of proteins, which play a role in myogenesis. The encoded protein, MEF2 polypeptide C, has both trans-activating and DNA binding activities. This protein may play a role in maintaining the differentiated state of muscle cells. Mutations and deletions at this locus have been associated with severe mental retardation, stereotypic movements, epilepsy, and cerebral malformation. Alternatively spliced transcript variants have been described. 4208 myocyte enhancer factor 2C
ENSG00000166741 NNMT N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. 4837 nicotinamide N-methyltransferase
ENSG00000143248 RGS5 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. 8490 regulator of G-protein signaling 5
ENSG00000142789 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 chymotrypsin like elastase family member 3A
ENSG00000166145 SPINT1 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. 6692 serine peptidase inhibitor, Kunitz type 1
ENSG00000090382 LYZ This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. 4069 lysozyme
ENSG00000182253 SYNM The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. 23336 synemin
ENSG00000183091 NEB This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. 4703 nebulin
ENSG00000134571 MYBPC3 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607 myosin binding protein C, cardiac
ENSG00000130176 CNN1 NA 1264 calponin 1
ENSG00000114378 HYAL1 This gene encodes a lysosomal hyaluronidase. Hyaluronidases intracellularly degrade hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan is thought to be involved in cell proliferation, migration and differentiation. This enzyme is active at an acidic pH and is the major hyaluronidase in plasma. Mutations in this gene are associated with mucopolysaccharidosis type IX, or hyaluronidase deficiency. The gene is one of several related genes in a region of chromosome 3p21.3 associated with tumor suppression. Multiple transcript variants encoding different isoforms have been found for this gene. 3373 hyaluronoglucosaminidase 1
ENSG00000160014 CALM3 NA 808 calmodulin 3 (phosphorylase kinase, delta)
ENSG00000160014 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805 calmodulin 2 (phosphorylase kinase, delta)
ENSG00000172023 REG1B This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5968 regenerating family member 1 beta
ENSG00000123689 G0S2 NA 50486 G0/G1 switch 2
ENSG00000235162 C12orf75 NA 387882 chromosome 12 open reading frame 75
ENSG00000118194 TNNT2 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 troponin T2, cardiac type
ENSG00000166147 FBN1 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. 2200 fibrillin 1
ENSG00000124942 AHNAK NA 79026 AHNAK nucleoprotein
ENSG00000172867 KRT2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 keratin 2
ENSG00000177791 MYOZ1 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. 58529 myozenin 1
ENSG00000255071 SAA2-SAA4 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. 100528017 SAA2-SAA4 readthrough
ENSG00000108551 RASD1 This gene encodes a member of the Ras superfamily of small GTPases and is induced by dexamethasone. The encoded protein is an activator of G-protein signaling and acts as a direct nucleotide exchange factor for Gi-Go proteins. This protein interacts with the neuronal nitric oxide adaptor protein CAPON, and a nuclear adaptor protein FE65, which interacts with the Alzheimer’s disease amyloid precursor protein. This gene may play a role in dexamethasone-induced alterations in cell morphology, growth and cell-extracellular matrix interactions. Epigenetic inactivation of this gene is closely correlated with resistance to dexamethasone in multiple myeloma cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 51655 ras related dexamethasone induced 1
ENSG00000074800 ENO1 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. 2023 enolase 1
ENSG00000135046 ANXA1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. 301 annexin A1
ENSG00000164924 YWHAZ This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and sheep orthologs. The encoded protein interacts with IRS1 protein, suggesting a role in regulating insulin sensitivity. Several transcript variants that differ in the 5’ UTR but that encode the same protein have been identified for this gene. 7534 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein zeta
ENSG00000042832 TG Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 thyroglobulin
ENSG00000153071 DAB2 This gene encodes a mitogen-responsive phosphoprotein. It is expressed in normal ovarian epithelial cells, but is down-regulated or absent from ovarian carcinoma cell lines, suggesting its role as a tumor suppressor. This protein binds to the SH3 domains of GRB2, an adaptor protein that couples tyrosine kinase receptors to SOS (a guanine nucleotide exchange factor for Ras), via its C-terminal proline-rich sequences, and may thus modulate growth factor/Ras pathways by competing with SOS for binding to GRB2. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 1601 DAB2, clathrin adaptor protein
ENSG00000009307 CSDE1 NA 7812 cold shock domain containing E1
ENSG00000133985 TTC9 This gene encodes a protein that contains three tetratricopeptide repeats. The gene has been shown to be hormonally regulated in breast cancer cells and may play a role in cancer cell invasion and metastasis. 23508 tetratricopeptide repeat domain 9
ENSG00000196296 ATP2A1 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. 487 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1
ENSG00000065717 TLE2 NA 7089 transducin like enhancer of split 2
ENSG00000206172 HBA1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039 hemoglobin subunit alpha 1
ENSG00000106991 ENG This gene encodes a homodimeric transmembrane protein which is a major glycoprotein of the vascular endothelium. This protein is a component of the transforming growth factor beta receptor complex and it binds to the beta1 and beta3 peptides with high affinity. Mutations in this gene cause hereditary hemorrhagic telangiectasia, also known as Osler-Rendu-Weber syndrome 1, an autosomal dominant multisystemic vascular dysplasia. This gene may also be involved in preeclampsia and several types of cancer. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2022 endoglin
ENSG00000084207 GSTP1 Glutathione S-transferases (GSTs) are a family of enzymes that play an important role in detoxification by catalyzing the conjugation of many hydrophobic and electrophilic compounds with reduced glutathione. Based on their biochemical, immunologic, and structural properties, the soluble GSTs are categorized into 4 main classes: alpha, mu, pi, and theta. This GST family member is a polymorphic gene encoding active, functionally different GSTP1 variant proteins that are thought to function in xenobiotic metabolism and play a role in susceptibility to cancer, and other diseases. 2950 glutathione S-transferase pi 1
ENSG00000138185 ENTPD1 The protein encoded by this gene is a plasma membrane protein that hydrolyzes extracellular ATP and ADP to AMP. Inhibition of this protein’s activity may confer anticancer benefits. Several transcript variants encoding different isoforms have been found for this gene. 953 ectonucleoside triphosphate diphosphohydrolase 1
ENSG00000272752 STAG3L5P-PVRIG2P-PILRB This locus represents naturally occurring readthrough transcription among the neighboring LOC101735302 (stromal antigen 3 pseudogene), LOC101752334 (poliovirus receptor related immunoglobulin domain containing pseudogene) and PILRB (paired immunoglobin-like type 2 receptor beta) genes on chromosome 7. The readthrough transcript is a candidate for nonsense-mediated mRNA decay (NMD), and is unlikely to produce a protein product. 101752399 STAG3L5P-PVRIG2P-PILRB readthrough
ENSG00000211445 GPX3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. 2878 glutathione peroxidase 3
ENSG00000157601 MX1 This gene encodes a guanosine triphosphate (GTP)-metabolizing protein that participates in the cellular antiviral response. The encoded protein is induced by type I and type II interferons and antagonizes the replication process of several different RNA and DNA viruses. There is a related gene located adjacent to this gene on chromosome 21, and there are multiple pseudogenes located in a cluster on chromosome 4. Alternative splicing results in multiple transcript variants. 4599 MX dynamin like GTPase 1
ENSG00000140564 FURIN This gene encodes a member of the subtilisin-like proprotein convertase family, which includes proteases that process protein and peptide precursors trafficking through regulated or constitutive branches of the secretory pathway. It encodes a type 1 membrane bound protease that is expressed in many tissues, including neuroendocrine, liver, gut, and brain. The encoded protein undergoes an initial autocatalytic processing event in the ER and then sorts to the trans-Golgi network through endosomes where a second autocatalytic event takes place and the catalytic activity is acquired. The product of this gene is one of the seven basic amino acid-specific members which cleave their substrates at single or paired basic residues. Some of its substrates include proparathyroid hormone, transforming growth factor beta 1 precursor, proalbumin, pro-beta-secretase, membrane type-1 matrix metalloproteinase, beta subunit of pro-nerve growth factor and von Willebrand factor. It is also thought to be one of the proteases responsible for the activation of HIV envelope glycoproteins gp160 and gp140 and may play a role in tumor progression. This gene is located in close proximity to family member proprotein convertase subtilisin/kexin type 6 and upstream of the FES oncogene. Alternative splicing results in multiple transcript variants. 5045 furin, paired basic amino acid cleaving enzyme
ENSG00000101470 TNNC2 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. 7125 troponin C2, fast skeletal type
ENSG00000168530 MYL1 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. 4632 myosin light chain 1
ENSG00000154553 PDLIM3 The protein encoded by this gene contains a PDZ domain and a LIM domain, indicating that it may be involved in cytoskeletal assembly. In support of this, the encoded protein has been shown to bind the spectrin-like repeats of alpha-actinin-2 and to colocalize with alpha-actinin-2 at the Z lines of skeletal muscle. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. Aberrant alternative splicing of this gene may play a role in myotonic dystrophy. 27295 PDZ and LIM domain 3
ENSG00000169738 DCXR The protein encoded by this gene acts as a homotetramer to catalyze diacetyl reductase and L-xylulose reductase reactions. The encoded protein may play a role in the uronate cycle of glucose metabolism and in the cellular osmoregulation in the proximal renal tubules. Defects in this gene are a cause of pentosuria. Two transcript variants encoding different isoforms have been found for this gene. 51181 dicarbonyl/L-xylulose reductase
ENSG00000163346 PBXIP1 The protein encoded by this gene interacts with the PBX1 homeodomain protein, inhibiting its transcriptional activation potential by preventing its binding to DNA. The encoded protein, which is primarily cytosolic but can shuttle to the nucleus, also can interact with estrogen receptors alpha and beta and promote the proliferation of breast cancer, brain tumors, and lung cancer. Several transcript variants encoding different isoforms have been found for this gene. More variants exist, but their full-length natures have yet to be determined. 57326 PBX homeobox interacting protein 1
ENSG00000245848 CEBPA This intronless gene encodes a transcription factor that contains a basic leucine zipper (bZIP) domain and recognizes the CCAAT motif in the promoters of target genes. The encoded protein functions in homodimers and also heterodimers with CCAAT/enhancer-binding proteins beta and gamma. Activity of this protein can modulate the expression of genes involved in cell cycle regulation as well as in body weight homeostasis. Mutation of this gene is associated with acute myeloid leukemia. The use of alternative in-frame non-AUG (GUG) and AUG start codons results in protein isoforms with different lengths. Differential translation initiation is mediated by an out-of-frame, upstream open reading frame which is located between the GUG and the first AUG start codons. 1050 CCAAT/enhancer binding protein alpha
ENSG00000112096 LOC100129518 NA 100129518 uncharacterized LOC100129518
ENSG00000112096 SOD2 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. 6648 superoxide dismutase 2, mitochondrial
ENSG00000109846 CRYAB Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 crystallin alpha B
ENSG00000169710 FASN The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 fatty acid synthase
ENSG00000081041 CXCL2 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. 2920 C-X-C motif chemokine ligand 2
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
cytochrome P450 family 17 subfamily A member 1 1586 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. CYP17A1 ENSG00000148795 NA
cytochrome P450 family 11 subfamily B member 1 1584 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. CYP11B1 ENSG00000160882 NA
immunoglobulin heavy constant alpha 1 ENSG00000211895 NA IGHA1 ENSG00000211895 NA
steroidogenic acute regulatory protein 6770 The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. STAR ENSG00000147465 NA
actin, gamma 2, smooth muscle, enteric 72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 ENSG00000163017 NA
apolipoprotein E 348 The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. APOE ENSG00000130203 NA
24-dehydrocholesterol reductase 1718 This gene encodes a flavin adenine dinucleotide (FAD)-dependent oxidoreductase which catalyzes the reduction of the delta-24 double bond of sterol intermediates during cholesterol biosynthesis. The protein contains a leader sequence that directs it to the endoplasmic reticulum membrane. Missense mutations in this gene have been associated with desmosterolosis. Also, reduced expression of the gene occurs in the temporal cortex of Alzheimer disease patients and overexpression has been observed in adrenal gland cancer cells. DHCR24 ENSG00000116133 NA
protamine 2 5620 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. PRM2 ENSG00000122304 NA
cytochrome P450 family 11 subfamily A member 1 1583 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and catalyzes the conversion of cholesterol to pregnenolone, the first and rate-limiting step in the synthesis of the steroid hormones. Two transcript variants encoding different isoforms have been found for this gene. The cellular location of the smaller isoform is unclear since it lacks the mitochondrial-targeting transit peptide. CYP11A1 ENSG00000140459 NA
tropomyosin 1 (alpha) 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. TPM1 ENSG00000140416 NA
FOS like 2, AP-1 transcription factor subunit 2355 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. FOSL2 ENSG00000075426 NA
actin, beta 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB ENSG00000075624 NA
scavenger receptor class B member 1 949 The protein encoded by this gene is a plasma membrane receptor for high density lipoprotein cholesterol (HDL). The encoded protein mediates cholesterol transfer to and from HDL. In addition, this protein is a receptor for hepatitis C virus glycoprotein E2. Two transcript variants encoding different isoforms have been found for this gene. SCARB1 ENSG00000073060 NA
acyl-CoA synthetase long-chain family member 1 2180 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. ACSL1 ENSG00000151726 NA
ubiquitin C 7316 This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. UBC ENSG00000150991 NA
creatine kinase B 1152 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. CKB ENSG00000166165 NA
haptoglobin 3240 This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. HP ENSG00000257017 NA
dual specificity phosphatase 1 1843 The expression of DUSP1 gene is induced in human skin fibroblasts by oxidative/heat stress and growth factors. It specifies a protein with structural features similar to members of the non-receptor-type protein-tyrosine phosphatase family, and which has significant amino-acid sequence similarity to a Tyr/Ser-protein phosphatase encoded by the late gene H1 of vaccinia virus. The bacterially expressed and purified DUSP1 protein has intrinsic phosphatase activity, and specifically inactivates mitogen-activated protein (MAP) kinase in vitro by the concomitant dephosphorylation of both its phosphothreonine and phosphotyrosine residues. Furthermore, it suppresses the activation of MAP kinase by oncogenic ras in extracts of Xenopus oocytes. Thus, DUSP1 may play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. DUSP1 ENSG00000120129 NA
aldehyde oxidase 1 316 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. AOX1 ENSG00000138356 NA
protamine 1 5619 NA PRM1 ENSG00000175646 NA
transgelin 6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. TAGLN ENSG00000149591 NA
periplakin 5493 The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. PPL ENSG00000118898 NA
profilin 1 5216 This gene encodes a member of the profilin family of small actin-binding proteins. The encoded protein plays an important role in actin dynamics by regulating actin polymerization in response to extracellular signals. Deletion of this gene is associated with Miller-Dieker syndrome, and the encoded protein may also play a role in Huntington disease. Multiple pseudogenes of this gene are located on chromosome 1. PFN1 ENSG00000108518 NA
N-myc downstream regulated 1 10397 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NDRG1 ENSG00000104419 NA
keratin 5 3852 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT5 ENSG00000186081 NA
epoxide hydrolase 1 2052 Epoxide hydrolase is a critical biotransformation enzyme that converts epoxides from the degradation of aromatic compounds to trans-dihydrodiols which can be conjugated and excreted from the body. Epoxide hydrolase functions in both the activation and detoxification of epoxides. Mutations in this gene cause preeclampsia, epoxide hydrolase deficiency or increased epoxide hydrolase activity. Alternatively spliced transcript variants encoding the same protein have been found for this gene. EPHX1 ENSG00000143819 NA
immunoglobulin heavy constant mu ENSG00000211899 NA IGHM ENSG00000211899 NA
serine peptidase inhibitor, Kazal type 5 11005 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. SPINK5 ENSG00000133710 NA
solute carrier family 40 member 1 30061 The protein encoded by this gene is a cell membrane protein that may be involved in iron export from duodenal epithelial cells. Defects in this gene are a cause of hemochromatosis type 4 (HFE4). SLC40A1 ENSG00000138449 NA
immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 NA IGHA2 ENSG00000211890 NA
Fos proto-oncogene, AP-1 transcription factor subunit 2353 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. In some cases, expression of the FOS gene has also been associated with apoptotic cell death. FOS ENSG00000170345 NA
myosin, heavy chain 7, cardiac muscle, beta 4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. MYH7 ENSG00000092054 NA
keratin 15 3866 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. KRT15 ENSG00000171346 NA
ribosomal protein SA 3921 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Many of the effects of laminin are mediated through interactions with cell surface receptors. These receptors include members of the integrin family, as well as non-integrin laminin-binding proteins. This gene encodes a high-affinity, non-integrin family, laminin receptor 1. This receptor has been variously called 67 kD laminin receptor, 37 kD laminin receptor precursor (37LRP) and p40 ribosome-associated protein. The amino acid sequence of laminin receptor 1 is highly conserved through evolution, suggesting a key biological function. It has been observed that the level of the laminin receptor transcript is higher in colon carcinoma tissue and lung cancer cell line than their normal counterparts. Also, there is a correlation between the upregulation of this polypeptide in cancer cells and their invasive and metastatic phenotype. Multiple copies of this gene exist, however, most of them are pseudogenes thought to have arisen from retropositional events. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. RPSA ENSG00000168028 NA
polymeric immunoglobulin receptor 5284 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. PIGR ENSG00000162896 NA
desmin 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. DES ENSG00000175084 NA
S100 calcium binding protein A8 6279 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. S100A8 ENSG00000143546 NA
CD55 molecule (Cromer blood group) 1604 This gene encodes a glycoprotein involved in the regulation of the complement cascade. Binding of the encoded protein to complement proteins accelerates their decay, thereby disrupting the cascade and preventing damage to host cells. Antigens present on this protein constitute the Cromer blood group system (CROM). Alternative splicing results in multiple transcript variants. The predominant transcript variant encodes a membrane-bound protein, but alternatively spliced transcripts may produce soluble proteins. CD55 ENSG00000196352 NA
myosin, heavy chain 11, smooth muscle 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 ENSG00000133392 NA
delta like non-canonical Notch ligand 1 8788 This gene encodes a transmembrane protein that contains multiple epidermal growth factor repeats that functions as a regulator of cell growth. The encoded protein is involved in the differentiation of several cell types including adipocytes. This gene is located in a region of chromosome 14 frequently showing unparental disomy, and is imprinted and expressed from the paternal allele. A single nucleotide variant in this gene is associated with child and adolescent obesity and shows polar overdominance, where heterozygotes carrying an active paternal allele express the phenotype, while mutant homozygotes are normal. DLK1 ENSG00000185559 NA
phosphorylase, glycogen; brain 5834 The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. PYGB ENSG00000100994 NA
decorin 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. DCN ENSG00000011465 NA
ZFP36 ring finger protein-like 1 677 This gene is a member of the TIS11 family of early response genes, which are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. This gene is well conserved across species and has a promoter that contains motifs seen in other early-response genes. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ZFP36L1 ENSG00000185650 NA
cytochrome P450 family 21 subfamily A member 2 1589 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and hydroxylates steroids at the 21 position. Its activity is required for the synthesis of steroid hormones including cortisol and aldosterone. Mutations in this gene cause congenital adrenal hyperplasia. A related pseudogene is located near this gene; gene conversion events involving the functional gene and the pseudogene are thought to account for many cases of steroid 21-hydroxylase deficiency. Two transcript variants encoding different isoforms have been found for this gene. CYP21A2 ENSG00000231852 NA
PHD finger protein 7 51533 Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. PHF7 ENSG00000010318 NA
serine incorporator 1 57515 NA SERINC1 ENSG00000111897 NA
aldo-keto reductase family 1 member B 231 This gene encodes a member of the aldo/keto reductase superfamily, which consists of more than 40 known enzymes and proteins. This member catalyzes the reduction of a number of aldehydes, including the aldehyde form of glucose, and is thereby implicated in the development of diabetic complications by catalyzing the reduction of glucose to sorbitol. Multiple pseudogenes have been identified for this gene. The nomenclature system used by the HUGO Gene Nomenclature Committee to define human aldo-keto reductase family members is known to differ from that used by the Mouse Genome Informatics database. AKR1B1 ENSG00000085662 NA
oxidation resistance 1 55074 NA OXR1 ENSG00000164830 NA
ubiquitin specific peptidase 32 84669 NA USP32 ENSG00000170832 NA
notch 2 4853 This gene encodes a member of the Notch family. Members of this Type 1 transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple, different domain types. Notch family members play a role in a variety of developmental processes by controlling cell fate decisions. The Notch signaling network is an evolutionarily conserved intercellular signaling pathway which regulates interactions between physically adjacent cells. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signaling pathway that plays a key role in development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remain to be determined. This protein is cleaved in the trans-Golgi network, and presented on the cell surface as a heterodimer. This protein functions as a receptor for membrane bound ligands, and may play a role in vascular, renal and hepatic development. Two transcript variants encoding different isoforms have been found for this gene. NOTCH2 ENSG00000134250 NA
HMG-box transcription factor 1 26959 NA HBP1 ENSG00000105856 NA
insulin like growth factor binding protein 5 3488 NA IGFBP5 ENSG00000115461 NA
complement component 1, s subcomponent 716 This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. C1S ENSG00000182326 NA
zinc finger AN1-type containing 5 7763 NA ZFAND5 ENSG00000107372 NA
radixin 5962 Radixin is a cytoskeletal protein that may be important in linking actin to the plasma membrane. It is highly similar in sequence to both ezrin and moesin. The radixin gene has been localized by fluorescence in situ hybridization to 11q23. A truncated version representing a pseudogene (RDXP2) was assigned to Xp21.3. Another pseudogene that seemed to lack introns (RDXP1) was mapped to 11p by Southern and PCR analyses. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. RDX ENSG00000137710 NA
vesicle amine transport 1 10493 Synaptic vesicles are responsible for regulating the storage and release of neurotransmitters in the nerve terminal. The protein encoded by this gene is an abundant integral membrane protein of cholinergic synaptic vesicles and is thought to be involved in vesicular transport. It belongs to the quinone oxidoreductase subfamily of zinc-containing alcohol dehydrogenase proteins. VAT1 ENSG00000108828 NA
alcohol dehydrogenase 1B (class I), beta polypeptide 125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ADH1B ENSG00000196616 NA
family with sequence similarity 198 member B 51313 NA FAM198B ENSG00000164125 NA
SH3 domain binding protein 5 9467 NA SH3BP5 ENSG00000131370 NA
peptidylprolyl isomerase A 5478 This gene encodes a member of the peptidyl-prolyl cis-trans isomerase (PPIase) family. PPIases catalyze the cis-trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. The encoded protein is a cyclosporin binding-protein and may play a role in cyclosporin A-mediated immunosuppression. The protein can also interact with several HIV proteins, including p55 gag, Vpr, and capsid protein, and has been shown to be necessary for the formation of infectious HIV virions. Multiple pseudogenes that map to different chromosomes have been reported. PPIA ENSG00000196262 NA
nuclear factor, interleukin 3 regulated 4783 The protein encoded by this gene is a transcriptional regulator that binds as a homodimer to activating transcription factor (ATF) sites in many cellular and viral promoters. The encoded protein represses PER1 and PER2 expression and therefore plays a role in the regulation of circadian rhythm. Three transcript variants encoding the same protein have been found for this gene. NFIL3 ENSG00000165030 NA
ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 3 489 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in calcium sequestration associated with muscular excitation and contraction. Alternative splicing results in multiple transcript variants encoding different isoforms. ATP2A3 ENSG00000074370 NA
cysteine and glycine rich protein 1 1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 ENSG00000159176 NA
lysine demethylase 5B 10765 This gene encodes a lysine-specific histone demethylase that belongs to the jumonji/ARID domain-containing family of histone demethylases. The encoded protein is capable of demethylating tri-, di- and monomethylated lysine 4 of histone H3. This protein plays a role in the transcriptional repression or certain tumor suppressor genes and is upregulated in certain cancer cells. This protein may also play a role in genome stability and DNA repair. Alternate splicing resultsi n multiple transcript variants. KDM5B ENSG00000117139 NA
arrestin domain containing 3 57561 NA ARRDC3 ENSG00000113369 NA
cytochrome c oxidase subunit 7C 1350 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes subunit VIIc, which shares 87% and 85% amino acid sequence identity with mouse and bovine COX VIIc, respectively, and is found in all tissues. A pseudogene COX7CP1 has been found on chromosome 13. COX7C ENSG00000127184 NA
synaptogyrin 2 9144 This gene encodes an integral membrane protein containing four transmembrane regions and a C-terminal cytoplasmic tail that is tyrosine phosphorylated. The exact function of this protein is unclear, but studies of a similar rat protein suggest that it may play a role in regulating membrane traffic in non-neuronal cells. The gene belongs to the synaptogyrin gene family. Alternative splicing results in multiple transcript variants. SYNGR2 ENSG00000108639 NA
solute carrier family 2 member 3 6515 NA SLC2A3 ENSG00000059804 NA
plexin domain containing 2 84898 NA PLXDC2 ENSG00000120594 NA
sterol regulatory element binding transcription factor 1 6720 This gene encodes a transcription factor that binds to the sterol regulatory element-1 (SRE1), which is a decamer flanking the low density lipoprotein receptor gene and some genes involved in sterol biosynthesis. The protein is synthesized as a precursor that is attached to the nuclear membrane and endoplasmic reticulum. Following cleavage, the mature protein translocates to the nucleus and activates transcription by binding to the SRE1. Sterols inhibit the cleavage of the precursor, and the mature nuclear form is rapidly catabolized, thereby reducing transcription. The protein is a member of the basic helix-loop-helix-leucine zipper (bHLH-Zip) transcription factor family. This gene is located within the Smith-Magenis syndrome region on chromosome 17. SREBF1 ENSG00000072310 NA
Kruppel like factor 6 1316 This gene encodes a member of the Kruppel-like family of transcription factors. The zinc finger protein is a transcriptional activator, and functions as a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene, some of which are implicated in carcinogenesis. KLF6 ENSG00000067082 NA
C-X-C motif chemokine ligand 16 58191 NA CXCL16 ENSG00000161921 NA
cytochrome c oxidase subunit 4I1 1327 Cytochrome c oxidase (COX) is the terminal enzyme of the mitochondrial respiratory chain. It is a multi-subunit enzyme complex that couples the transfer of electrons from cytochrome c to molecular oxygen and contributes to a proton electrochemical gradient across the inner mitochondrial membrane. The complex consists of 13 mitochondrial- and nuclear-encoded subunits. The mitochondrially-encoded subunits perform the electron transfer and proton pumping activities. The functions of the nuclear-encoded subunits are unknown but they may play a role in the regulation and assembly of the complex. This gene encodes the nuclear-encoded subunit IV isoform 1 of the human mitochondrial respiratory chain enzyme. It is located at the 3’ of the NOC4 (neighbor of COX4) gene in a head-to-head orientation, and shares a promoter with it. Pseudogenes related to this gene are located on chromosomes 13 and 14. Alternative splicing results in multiple transcript variants encoding different isoforms. COX4I1 ENSG00000131143 NA
solute carrier family 38 member 1 81539 Amino acid transporters play essential roles in the uptake of nutrients, production of energy, chemical metabolism, detoxification, and neurotransmitter cycling. SLC38A1 is an important transporter of glutamine, an intermediate in the detoxification of ammonia and the production of urea. Glutamine serves as a precursor for the synaptic transmitter, glutamate (Gu et al., 2001 [PubMed 11325958]). SLC38A1 ENSG00000111371 NA
stearoyl-CoA desaturase 6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. SCD ENSG00000099194 NA
mucin 1, cell surface associated 4582 This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. MUC1 ENSG00000185499 NA
DnaJ heat shock protein family (Hsp40) member B1 3337 This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. DNAJB1 ENSG00000132002 NA
tropomyosin 2 (beta) 7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2 ENSG00000198467 NA
CKLF like MARVEL transmembrane domain containing 2 146225 This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. CMTM2 ENSG00000140932 NA
myotubularin related protein 3 8897 This gene encodes a member of the myotubularin dual specificity protein phosphatase gene family. The encoded protein is structurally similar to myotubularin but in addition contains a FYVE domain and an N-terminal PH-GRAM domain. The protein can self-associate and also form heteromers with another myotubularin related protein. The protein binds to phosphoinositide lipids through the PH-GRAM domain, and can hydrolyze phosphatidylinositol(3)-phosphate and phosphatidylinositol(3,5)-biphosphate in vitro. The encoded protein has been observed to have a perinuclear, possibly membrane-bound, distribution in cells, but it has also been found free in the cytoplasm. Multiple transcript variants encoding different isoforms have been found for this gene. MTMR3 ENSG00000100330 NA
lumican 4060 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. LUM ENSG00000139329 NA
receptor accessory protein 6 92840 NA REEP6 ENSG00000115255 NA
ribosomal protein L27a 6157 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L15P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, multiple processed pseudogenes derived from this gene are dispersed through the genome. RPL27A ENSG00000166441 NA
huntingtin interacting protein 1 3092 The product of this gene is a membrane-associated protein that functions in clathrin-mediated endocytosis and protein trafficking within the cell. The encoded protein binds to the huntingtin protein in the brain; this interaction is lost in Huntington’s disease. Alternative splicing results in multiple transcript variants. HIP1 ENSG00000127946 NA
fibronectin 1 2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. FN1 ENSG00000115414 NA
sestrin 3 143686 This gene encodes a member of the sestrin family of stress-induced proteins. The encoded protein reduces the levels of intracellular reactive oxygen species induced by activated Ras downstream of RAC-alpha serine/threonine-protein kinase (Akt) and FoxO transcription factor. The protein is required for normal regulation of blood glucose, insulin resistance and plays a role in lipid storage in obesity. Alternative splicing results in multiple transcript variants. SESN3 ENSG00000149212 NA
transducin like enhancer of split 4 7091 NA TLE4 ENSG00000106829 NA
zinc finger E-box binding homeobox 2 9839 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. ZEB2 ENSG00000169554 NA
immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 NA IGLC1 ENSG00000211675 NA
NA NA NA NA ENSG00000090920 TRUE
immunoglobulin lambda like polypeptide 5 100423062 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. IGLL5 ENSG00000254709 NA
solute carrier family 25 member 3 5250 The protein encoded by this gene catalyzes the transport of phosphate into the mitochondrial matrix, either by proton cotransport or in exchange for hydroxyl ions. The protein contains three related segments arranged in tandem which are related to those found in other characterized members of the mitochondrial carrier family. Both the N-terminal and C-terminal regions of this protein protrude toward the cytosol. Multiple alternatively spliced transcript variants have been isolated. SLC25A3 ENSG00000075415 NA
5’-aminolevulinate synthase 1 211 This gene encodes the mitochondrial enzyme which is catalyzes the rate-limiting step in heme (iron-protoporphyrin) biosynthesis. The enzyme encoded by this gene is the housekeeping enzyme; a separate gene encodes a form of the enzyme that is specific for erythroid tissue. The level of the mature encoded protein is regulated by heme: high levels of heme down-regulate the mature enzyme in mitochondria while low heme levels up-regulate. A pseudogene of this gene is located on chromosome 12. Alternative splicing results in multiple transcript variants encoding different isoforms. ALAS1 ENSG00000023330 NA
complement factor H 3075 This gene is a member of the Regulator of Complement Activation (RCA) gene cluster and encodes a protein with twenty short consensus repeat (SCR) domains. This protein is secreted into the bloodstream and has an essential role in the regulation of complement activation, restricting this innate defense mechanism to microbial infections. Mutations in this gene have been associated with hemolytic-uremic syndrome (HUS) and chronic hypocomplementemic nephropathy. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. CFH ENSG00000000971 NA
serpin family A member 1 5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. SERPINA1 ENSG00000197249 NA
NADH:ubiquinone oxidoreductase subunit S5 4725 This gene is a member of the NADH dehydrogenase (ubiquinone) iron-sulfur protein family. The encoded protein is a subunit of the NADH:ubiquinone oxidoreductase (complex I), the first enzyme complex in the electron transport chain located in the inner mitochondrial membrane. Alternative splicing results in multiple transcript variants and pseudogenes have been identified on chromosomes 1, 4 and 17. NDUFS5 ENSG00000168653 NA
H19, imprinted maternally expressed transcript (non-protein coding) 283120 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. H19 ENSG00000130600 NA
basic helix-loop-helix family member e40 8553 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. BHLHE40 ENSG00000134107 NA
small nucleolar RNA host gene 3 8420 NA SNHG3 ENSG00000242125 NA
reticulon 4 57142 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. The product of this gene is a potent neurite outgrowth inhibitor which may also help block the regeneration of the central nervous system in higher vertebrates. Alternatively spliced transcript variants derived both from differential splicing and differential promoter usage and encoding different isoforms have been identified. RTN4 ENSG00000115310 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta HBB ENSG00000244734 NA
1281 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type III alpha 1 chain COL3A1 ENSG00000168542 NA
3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 KRT1 ENSG00000167768 NA
5284 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. polymeric immunoglobulin receptor PIGR ENSG00000162896 NA
70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). actin, alpha, cardiac muscle 1 ACTC1 ENSG00000159251 NA
2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. fatty acid binding protein 4 FABP4 ENSG00000170323 NA
6515 NA solute carrier family 2 member 3 SLC2A3 ENSG00000059804 NA
23710 NA GABA type A receptor associated protein like 1 GABARAPL1 ENSG00000139112 NA
7145 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. tensin 1 TNS1 ENSG00000079308 NA
7450 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. von Willebrand factor VWF ENSG00000110799 NA
ENSG00000211890 NA immunoglobulin heavy constant alpha 2 (A2m marker) IGHA2 ENSG00000211890 NA
3326 This gene encodes a member of the heat shock protein 90 family; these proteins are involved in signal transduction, protein folding and degradation and morphological evolution. This gene encodes the constitutive form of the cytosolic 90 kDa heat-shock protein and is thought to play a role in gastric apoptosis and inflammation. Alternative splicing results in multiple transcript variants. Pseudogenes have been identified on multiple chromosomes. heat shock protein 90kDa alpha family class B member 1 HSP90AB1 ENSG00000096384 NA
1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin DES ENSG00000175084 NA
60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta ACTB ENSG00000075624 NA
2621 This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. growth arrest specific 6 GAS6 ENSG00000183087 NA
8553 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. basic helix-loop-helix family member e40 BHLHE40 ENSG00000134107 NA
5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. protease, serine 1 PRSS1 ENSG00000204983 NA
2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase FASN ENSG00000169710 NA
25802 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. leiomodin 1 LMOD1 ENSG00000163431 NA
1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type CKM ENSG00000104879 NA
28231 NA solute carrier organic anion transporter family member 4A1 SLCO4A1 ENSG00000101187 NA
3329 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. heat shock protein family D (Hsp60) member 1 HSPD1 ENSG00000144381 NA
4604 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. myosin binding protein C, slow type MYBPC1 ENSG00000196091 NA
27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ankyrin repeat domain 1 ANKRD1 ENSG00000148677 NA
6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. transgelin TAGLN ENSG00000149591 NA
290 Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. alanyl aminopeptidase, membrane ANPEP ENSG00000166825 NA
1357 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. carboxypeptidase A1 CPA1 ENSG00000091704 NA
8013 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. nuclear receptor subfamily 4 group A member 3 NR4A3 ENSG00000119508 NA
8991 This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. selenium binding protein 1 SELENBP1 ENSG00000143416 NA
NA NA NA NA ENSG00000090920 TRUE
5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 alpha REG1A ENSG00000115386 NA
10399 NA receptor for activated C kinase 1 RACK1 ENSG00000204628 NA
7532 This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the rat ortholog. It is induced by growth factors in human vascular smooth muscle cells, and is also highly expressed in skeletal and heart muscles, suggesting an important role for this protein in muscle tissue. It has been shown to interact with RAF1 and protein kinase C, proteins involved in various signal transduction pathways. tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma YWHAG ENSG00000170027 NA
3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 KRT10 ENSG00000186395 NA
5837 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. phosphorylase, glycogen, muscle PYGM ENSG00000068976 NA
55466 NA DnaJ heat shock protein family (Hsp40) member A4 DNAJA4 ENSG00000140403 NA
54541 NA DNA damage inducible transcript 4 DDIT4 ENSG00000168209 NA
5406 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. pancreatic lipase PNLIP ENSG00000175535 NA
3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 KRT13 ENSG00000171401 NA
2821 This gene encodes a member of the glucose phosphate isomerase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. In the cytoplasm, the gene product functions as a glycolytic enzyme (glucose-6-phosphate isomerase) that interconverts glucose-6-phophsate and fructose-6-phosphate. Extracellularly, the encoded protein (also referred to as neuroleukin) functions as a neurotrophic factor that promotes survival of skeletal motor neurons and sensory neurons, and as a lymphokine that induces immunoglobulin secretion. The encoded protein is also referred to as autocrine motility factor based on an additional function as a tumor-secreted cytokine and angiogenic factor. Defects in this gene are the cause of nonspherocytic hemolytic anemia and a severe enzyme deficiency can be associated with hydrops fetalis, immediate neonatal death and neurological impairment. Alternative splicing results in multiple transcript variants. glucose-6-phosphate isomerase GPI ENSG00000105220 NA
3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 KRT4 ENSG00000170477 NA
301 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. annexin A1 ANXA1 ENSG00000135046 NA
213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. albumin ALB ENSG00000163631 NA
30819 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. potassium voltage-gated channel interacting protein 2 KCNIP2 ENSG00000120049 NA
10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. chymotrypsin like elastase family member 3A CELA3A ENSG00000142789 NA
7832 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein is involved in the regulation of the G1/S transition of the cell cycle. BTG family member 2 BTG2 ENSG00000159388 NA
59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta ACTA2 ENSG00000107796 NA
7128 This gene was identified as a gene whose expression is rapidly induced by the tumor necrosis factor (TNF). The protein encoded by this gene is a zinc finger protein and ubiqitin-editing enzyme, and has been shown to inhibit NF-kappa B activation as well as TNF-mediated apoptosis. The encoded protein, which has both ubiquitin ligase and deubiquitinase activities, is involved in the cytokine-mediated immune and inflammatory responses. Several transcript variants encoding the same protein have been found for this gene. TNF alpha induced protein 3 TNFAIP3 ENSG00000118503 NA
123 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. perilipin 2 PLIN2 ENSG00000147872 NA
284119 This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. polymerase I and transcript release factor PTRF ENSG00000177469 NA
72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric ACTG2 ENSG00000163017 NA
1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. cysteine and glycine rich protein 1 CSRP1 ENSG00000159176 NA
3725 This gene is the putative transforming gene of avian sarcoma virus 17. It encodes a protein which is highly similar to the viral protein, and which interacts directly with specific target DNA sequences to regulate gene expression. This gene is intronless and is mapped to 1p32-p31, a chromosomal region involved in both translocations and deletions in human malignancies. Jun proto-oncogene, AP-1 transcription factor subunit JUN ENSG00000177606 NA
3960 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. galectin 4 LGALS4 ENSG00000171747 NA
2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. glycoprotein 2 GP2 ENSG00000169347 NA
64651 This gene encodes a protein that localizes to the nucleus and expression of this gene is induced in response to elevated levels of axin. The Wnt signalling pathway, which is negatively regulated by axin, is important in axis formation in early development and impaired regulation of this signalling pathway is often involved in tumors. A decreased level of expression of this gene in tumors compared to the level of expression in their corresponding normal tissues suggests that this gene product has a tumor suppressor function. Alternative splicing results in multiple transcript variants. cysteine and serine rich nuclear protein 1 CSRNP1 ENSG00000144655 NA
1401 The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. C-reactive protein, pentraxin-related CRP ENSG00000132693 NA
7422 This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. vascular endothelial growth factor A VEGFA ENSG00000112715 NA
7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin TTN ENSG00000155657 NA
730 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. complement component 7 C7 ENSG00000112936 NA
4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha MYH6 ENSG00000197616 NA
7316 This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. ubiquitin C UBC ENSG00000150991 NA
800 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. caldesmon 1 CALD1 ENSG00000122786 NA
87 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. actinin alpha 1 ACTN1 ENSG00000072110 NA
4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. lipoprotein lipase LPL ENSG00000175445 NA
1360 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. carboxypeptidase B1 CPB1 ENSG00000153002 NA
1844 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1 and ERK2, is predominantly expressed in hematopoietic tissues, and is localized in the nucleus. dual specificity phosphatase 2 DUSP2 ENSG00000158050 NA
4666 This gene encodes a protein that associates with basic transcription factor 3 (BTF3) to form the nascent polypeptide-associated complex (NAC). This complex binds to nascent proteins that lack a signal peptide motif as they emerge from the ribosome, blocking interaction with the signal recognition particle (SRP) and preventing mistranslocation to the endoplasmic reticulum. This protein is an IgE autoantigen in atopic dermatitis patients. Alternative splicing results in multiple transcript variants, but the full length nature of some of these variants, including those encoding very large proteins, has not been determined. There are multiple pseudogenes of this gene on different chromosomes. nascent polypeptide-associated complex alpha subunit NACA ENSG00000196531 NA
1958 The protein encoded by this gene belongs to the EGR family of C2H2-type zinc-finger proteins. It is a nuclear protein and functions as a transcriptional regulator. The products of target genes it activates are required for differentitation and mitogenesis. Studies suggest this is a cancer suppressor gene. early growth response 1 EGR1 ENSG00000120738 NA
6289 NA serum amyloid A2 SAA2 ENSG00000134339 NA
316 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. aldehyde oxidase 1 AOX1 ENSG00000138356 NA
5950 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. retinol binding protein 4 RBP4 ENSG00000138207 NA
9308 The protein encoded by this gene is a single-pass type I membrane protein and member of the immunoglobulin superfamily of receptors. The encoded protein may be involved in the regulation of antigen presentation. A soluble form of this protein can bind to dendritic cells and inhibit their maturation. Three transcript variants encoding different isoforms have been found for this gene. CD83 molecule CD83 ENSG00000112149 NA
7170 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. tropomyosin 3 TPM3 ENSG00000143549 NA
4607 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. myosin binding protein C, cardiac MYBPC3 ENSG00000134571 NA
1901 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. sphingosine-1-phosphate receptor 1 S1PR1 ENSG00000170989 NA
4190 This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. malate dehydrogenase 1 MDH1 ENSG00000014641 NA
10252 NA sprouty RTK signaling antagonist 1 SPRY1 ENSG00000164056 NA
1208 The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. colipase CLPS ENSG00000137392 NA
4619 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. myosin, heavy chain 1, skeletal muscle, adult MYH1 ENSG00000109061 NA
80005 NA dedicator of cytokinesis 5 DOCK5 ENSG00000147459 NA
ENSG00000269926 NA NA RP11-442H21.2 ENSG00000269926 NA
667 This gene encodes a member of the plakin protein family of adhesion junction plaque proteins. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene, but the full-length nature of some variants has not been defined. It has been reported that some isoforms are expressed in neural and muscle tissue, anchoring neural intermediate filaments to the actin cytoskeleton, and some isoforms are expressed in epithelial tissue, anchoring keratin-containing intermediate filaments to hemidesmosomes. Consistent with the expression, mice defective for this gene show skin blistering and neurodegeneration. dystonin DST ENSG00000151914 NA
718 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. complement component 3 C3 ENSG00000125730 NA
23436 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. chymotrypsin like elastase family member 3B CELA3B ENSG00000219073 NA
4620 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. myosin, heavy chain 2, skeletal muscle, adult MYH2 ENSG00000125414 NA
6175 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein lateral stalk subunit P0 RPLP0 ENSG00000089157 NA
7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. troponin T2, cardiac type TNNT2 ENSG00000118194 NA
7314 This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. ubiquitin B UBB ENSG00000170315 NA
7077 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP metallopeptidase inhibitor 2 TIMP2 ENSG00000035862 NA
26585 This gene encodes a member of the BMP (bone morphogenic protein) antagonist family. Like BMPs, BMP antagonists contain cystine knots and typically form homo- and heterodimers. The CAN (cerberus and dan) subfamily of BMP antagonists, to which this gene belongs, is characterized by a C-terminal cystine knot with an eight-membered ring. The antagonistic effect of the secreted glycosylated protein encoded by this gene is likely due to its direct binding to BMP proteins. As an antagonist of BMP, this gene may play a role in regulating organogenesis, body patterning, and tissue differentiation. In mouse, this protein has been shown to relay the sonic hedgehog (SHH) signal from the polarizing region to the apical ectodermal ridge during limb bud outgrowth. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. gremlin 1, DAN family BMP antagonist GREM1 ENSG00000166923 NA
5502 NA protein phosphatase 1 regulatory inhibitor subunit 1A PPP1R1A ENSG00000135447 NA
NA NA NA NA ENSG00000259716 TRUE
4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. myelin basic protein MBP ENSG00000197971 NA
3315 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). heat shock protein family B (small) member 1 HSPB1 ENSG00000106211 NA
116496 NA family with sequence similarity 129 member A FAM129A ENSG00000135842 NA
57153 NA solute carrier family 44 member 2 SLC44A2 ENSG00000129353 NA
4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. matrix Gla protein MGP ENSG00000111341 NA
26986 This gene encodes a poly(A) binding protein. The protein shuttles between the nucleus and cytoplasm and binds to the 3’ poly(A) tail of eukaryotic messenger RNAs via RNA-recognition motifs. The binding of this protein to poly(A) promotes ribosome recruitment and translation initiation; it is also required for poly(A) shortening which is the first step in mRNA decay. The gene is part of a small gene family including three protein-coding genes and several pseudogenes. poly(A) binding protein cytoplasmic 1 PABPC1 ENSG00000070756 NA
3336 This gene encodes a major heat shock protein which functions as a chaperonin. Its structure consists of a heptameric ring which binds to another heat shock protein in order to form a symmetric, functional heterodimer which enhances protein folding in an ATP-dependent manner. This gene and its co-chaperonin, HSPD1, are arranged in a head-to-head orientation on chromosome 2. Naturally occurring read-through transcription occurs between this locus and the neighboring locus MOBKL3. heat shock protein family E (Hsp10) member 1 HSPE1 ENSG00000115541 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
summary X_id symbol name query
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 KRT13 keratin 13 ENSG00000171401
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 KRT4 keratin 4 ENSG00000170477
NA 6707 SPRR3 small proline rich protein 3 ENSG00000163209
NA ENSG00000229732 AC019349.5 NA ENSG00000229732
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853 KRT6A keratin 6A ENSG00000205420
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 S100A9 S100 calcium binding protein A9 ENSG00000163220
This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. 301 ANXA1 annexin A1 ENSG00000135046
This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 CRNN cornulin ENSG00000143536
NA 51458 RHCG Rh family C glycoprotein ENSG00000140519
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. 6279 S100A8 S100 calcium binding protein A8 ENSG00000143546
The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). 1476 CSTB cystatin B ENSG00000160213
NA 2012 EMP1 epithelial membrane protein 1 ENSG00000134531
This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 KRT10 keratin 10 ENSG00000186395
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 KRT1 keratin 1 ENSG00000167768
This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). 57402 S100A14 S100 calcium binding protein A14 ENSG00000189334
The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. 4118 MAL mal, T-cell differentiation protein ENSG00000172005
Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. 7053 TGM3 transglutaminase 3 ENSG00000125780
NA 6700 SPRR2A small proline rich protein 2A ENSG00000241794
The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. 1475 CSTA cystatin A ENSG00000121552
NA 6698 SPRR1A small proline rich protein 1A ENSG00000169474
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 KRT2 keratin 2 ENSG00000172867
This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. 1893 ECM1 extracellular matrix protein 1 ENSG00000143369
NA 64855 FAM129B family with sequence similarity 129 member B ENSG00000136830
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. 6282 S100A11 S100 calcium binding protein A11 ENSG00000163191
NA 140576 S100A16 S100 calcium binding protein A16 ENSG00000188643
The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. 3557 IL1RN interleukin 1 receptor antagonist ENSG00000136689
This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. 2706 GJB2 gap junction protein beta 2 ENSG00000165474
This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. 11005 SPINK5 serine peptidase inhibitor, Kazal type 5 ENSG00000133710
The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. 5493 PPL periplakin ENSG00000118898
This gene encodes a member of the desmocollin protein subfamily. Desmocollins, along with desmogleins, are cadherin-like transmembrane glycoproteins that are major components of the desmosome. Desmosomes are cell-cell junctions that help resist shearing forces and are found in high concentrations in cells subject to mechanical stress. This gene is found in a cluster with other desmocollin family members on chromosome 18. Mutations in this gene are associated with arrhythmogenic right ventricular dysplasia-11, and reduced protein expression has been described in several types of cancer. Alternative splicing results in multiple transcript variants. 1824 DSC2 desmocollin 2 ENSG00000134755
NA ENSG00000234964 FABP5P7 fatty acid binding protein 5 pseudogene 7 ENSG00000234964
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may have a tumor suppressor function. Chromosomal rearrangements and altered expression of this gene have been implicated in breast cancer. 6273 S100A2 S100 calcium binding protein A2 ENSG00000196754
NA 2810 SFN stratifin ENSG00000175793
This gene encodes an elastase-specific inhibitor that functions as an antimicrobial peptide against Gram-positive and Gram-negative bacteria, and fungal pathogens. The protein contains a WAP-type four-disulfide core (WFDC) domain, and is thus a member of the WFDC domain family. Most WFDC gene members are localized to chromosome 20q12-q13 in two clusters: centromeric and telomeric. This gene belongs to the centromeric cluster. Expression of this gene is upgulated by bacterial lipopolysaccharides and cytokines. 5266 PI3 peptidase inhibitor 3 ENSG00000124102
The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. 3880 KRT19 keratin 19 ENSG00000171345
The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). 7051 TGM1 transglutaminase 1 ENSG00000092295
6-phosphogluconate dehydrogenase is the second dehydrogenase in the pentose phosphate shunt. Deficiency of this enzyme is generally asymptomatic, and the inheritance of this disorder is autosomal dominant. Hemolysis results from combined deficiency of 6-phosphogluconate dehydrogenase and 6-phosphogluconolactonase suggesting a synergism of the two enzymopathies. Several transcript variants encoding different isoforms have been found for this gene. 5226 PGD phosphogluconate dehydrogenase ENSG00000142657
This gene encodes a protein that belongs to the lipocalin family. Members of this family transport small hydrophobic molecules such as lipids, steroid hormones and retinoids. The protein encoded by this gene is a neutrophil gelatinase-associated lipocalin and plays a role in innate immunity by limiting bacterial growth as a result of sequestering iron-containing siderophores. The presence of this protein in blood and urine is an early biomarker of acute kidney injury. This protein is thought to be be involved in multiple cellular processes, including maintenance of skin homeostasis, and suppression of invasiveness and metastasis. Mice lacking this gene are more susceptible to bacterial infection than wild type mice. 3934 LCN2 lipocalin 2 ENSG00000148346
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. 3866 KRT15 keratin 15 ENSG00000171346
This gene encodes a secreted inhibitor which protects epithelial tissues from serine proteases. It is found in various secretions including seminal plasma, cervical mucus, and bronchial secretions, and has affinity for trypsin, leukocyte elastase, and cathepsin G. Its inhibitory effect contributes to the immune response by protecting epithelial surfaces from attack by endogenous proteolytic enzymes. This antimicrobial protein has antibacterial, antifungal and antiviral activity. 6590 SLPI secretory leukocyte peptidase inhibitor ENSG00000124107
NA 388 RHOB ras homolog family member B ENSG00000143878
This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. 360 AQP3 aquaporin 3 (Gill blood group) ENSG00000165272
The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. 1992 SERPINB1 serpin family B member 1 ENSG00000021355
Aldehyde dehydrogenases oxidize various aldehydes to the corresponding acids. They are involved in the detoxification of alcohol-derived acetaldehyde and in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. The enzyme encoded by this gene forms a cytoplasmic homodimer that preferentially oxidizes aromatic and medium-chain (6 carbons or more) saturated and unsaturated aldehyde substrates. It is thought to promote resistance to UV and 4-hydroxy-2-nonenal-induced oxidative damage in the cornea. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Multiple alternatively spliced variants, encoding the same protein, have been identified. 218 ALDH3A1 aldehyde dehydrogenase 3 family member A1 ENSG00000108602
This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. 4070 TACSTD2 tumor-associated calcium signal transducer 2 ENSG00000184292
The protein encoded by this gene is an envelope protein of keratinocytes. The encoded protein is crosslinked to membrane proteins by transglutaminase, forming an insoluble layer under the plasma membrane. This protein is proline-rich and contains several tandem amino acid repeats. 6699 SPRR1B small proline rich protein 1B ENSG00000169469
Granulins are a family of secreted, glycosylated peptides that are cleaved from a single precursor protein with 7.5 repeats of a highly conserved 12-cysteine granulin/epithelin motif. The 88 kDa precursor protein, progranulin, is also called proepithelin and PC cell-derived growth factor. Cleavage of the signal peptide produces mature granulin which can be further cleaved into a variety of active, 6 kDa peptides. These smaller cleavage products are named granulin A, granulin B, granulin C, etc. Epithelins 1 and 2 are synonymous with granulins A and B, respectively. Both the peptides and intact granulin protein regulate cell growth. However, different members of the granulin protein family may act as inhibitors, stimulators, or have dual actions on cell growth. Granulin family members are important in normal development, wound healing, and tumorigenesis. 2896 GRN granulin ENSG00000030582
This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. 4014 LOR loricrin ENSG00000203782
This gene encodes a member of the retinoic acid (RA, a form of vitamin A) binding protein family and lipocalin/cytosolic fatty-acid binding protein family. The protein is a cytosol-to-nuclear shuttling protein, which facilitates RA binding to its cognate receptor complex and transfer to the nucleus. It is involved in the retinoid signaling pathway, and is associated with increased circulating low-density lipoprotein cholesterol. Alternatively spliced transcript variants encoding the same protein have been found for this gene. 1382 CRABP2 cellular retinoic acid binding protein 2 ENSG00000143320
This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 302 ANXA2 annexin A2 ENSG00000182718
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. 6281 S100A10 S100 calcium binding protein A10 ENSG00000197747
This gene is located within a large protease gene cluster on chromosome 16. It belongs to the group-1 subfamily of serine proteases. The encoded protein is a secreted tryptic serine protease and is expressed mainly in the pancreas. Alternative splicing results in multiple transcript variants. 83886 PRSS27 protease, serine 27 ENSG00000172382
This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. 54869 EPS8L1 EPS8 like 1 ENSG00000131037
This gene encodes a member of the Armadillo protein family, which function in adhesion between cells and signal transduction. Multiple translation initiation codons and alternative splicing result in many different isoforms being translated. Not all of the full-length natures of the described transcript variants have been determined. Read-through transcription also exists between this gene and the neighboring upstream thioredoxin-related transmembrane protein 2 (TMX2) gene. 1500 CTNND1 catenin delta 1 ENSG00000198561
This gene encodes the fatty acid binding protein found in epidermal cells, and was first identified as being upregulated in psoriasis tissue. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. FABPs may play roles in fatty acid uptake, transport, and metabolism. Polymorphisms in this gene are associated with type 2 diabetes. The human genome contains many pseudogenes similar to this locus. 2171 FABP5 fatty acid binding protein 5 ENSG00000164687
NA 810 CALML3 calmodulin like 3 ENSG00000178363
NA 84518 CNFN cornifelin ENSG00000105427
Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. 10205 MPZL2 myelin protein zero like 2 ENSG00000149573
This gene encodes a member of the EPS8 gene family. The encoded protein, like other members of the family, is thought to link growth factor stimulation to actin organization, generating functional redundancy in the pathways that regulate actin cytoskeletal remodeling. 64787 EPS8L2 EPS8 like 2 ENSG00000177106
RAB10 belongs to the RAS (see HRAS; MIM 190020) superfamily of small GTPases. RAB proteins localize to exocytic and endocytic compartments and regulate intracellular vesicle trafficking (Bao et al., 1998 [PubMed 9918381]). 10890 RAB10 RAB10, member RAS oncogene family ENSG00000084733
The protein encoded by this gene is required for the reduction of fatty acids to fatty alcohols, a process that is required for the synthesis of monoesters and ether lipids. NADPH is required as a cofactor in this reaction, and 16-18 carbon saturated and unsaturated fatty acids are the preferred substrate. This is a peroxisomal membrane protein, and studies suggest that the N-terminus contains a large catalytic domain located on the outside of the peroxisome, while the C-terminus is exposed to the matrix of the peroxisome. Studies indicate that the regulation of this protein is dependent on plasmalogen levels. Mutations in this gene have been associated with individuals affected by severe intellectual disability, early-onset epilepsy, microcephaly, congenital cataracts, growth retardation, and spasticity (PMID: 25439727). A pseudogene of this gene is located on chromosome 13. 84188 FAR1 fatty acyl-CoA reductase 1 ENSG00000197601
This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. This gene encodes a protein that binds ephrin-A ligands. Mutations in this gene are the cause of certain genetically-related cataract disorders. 1969 EPHA2 EPH receptor A2 ENSG00000142627
NA 55076 TMEM45A transmembrane protein 45A ENSG00000181458
This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ABLIM1 actin binding LIM protein 1 ENSG00000099204
NA ENSG00000249007 RP11-510N19.5 NA ENSG00000249007
NA 54544 CRCT1 cysteine rich C-terminal 1 ENSG00000169509
NA 202 AIM1 absent in melanoma 1 ENSG00000112297
This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. 51806 CALML5 calmodulin like 5 ENSG00000178372
NA 151516 ASPRV1 aspartic peptidase, retroviral-like 1 ENSG00000244617
This gene encodes a member of a large family of proteins that activate Rho-type guanosine triphosphate (GTP) metabolizing enzymes. The encoded protein may pay a role in clathrin-mediated endocytosis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 201176 ARHGAP27 Rho GTPase activating protein 27 ENSG00000159314
This gene encodes a protein that contains domains of thioredoxin and ERV1, members of two long-standing gene families. The gene expression is induced as fibroblasts begin to exit the proliferative cycle and enter quiescence, suggesting that this gene plays an important role in growth regulation. Two transcript variants encoding two different isoforms have been found for this gene. 5768 QSOX1 quiescin sulfhydryl oxidase 1 ENSG00000116260
NA 30001 ERO1A endoplasmic reticulum oxidoreductase alpha ENSG00000197930
The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. LGALS3BP has been found elevated in the serum of patients with cancer and in those infected by the human immunodeficiency virus (HIV). It appears to be implicated in immune response associated with natural killer (NK) and lymphokine-activated killer (LAK) cell cytotoxicity. Using fluorescence in situ hybridization the full length 90K cDNA has been localized to chromosome 17q25. The native protein binds specifically to a human macrophage-associated lectin known as Mac-2 and also binds galectin 1. 3959 LGALS3BP galectin 3 binding protein ENSG00000108679
GIPC1 is a scaffolding protein that regulates cell surface receptor expression and trafficking (Lee et al., 2008 [PubMed 18775991]). 10755 GIPC1 GIPC PDZ domain containing family member 1 ENSG00000123159
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3852 KRT5 keratin 5 ENSG00000186081
This gene encodes a member of the prominin family of pentaspan membrane glycoproteins. The encoded protein localizes to basal epithelial cells and may be involved in the organization of plasma membrane microdomains. Alternative splicing results in multiple transcript variants. 150696 PROM2 prominin 2 ENSG00000155066
This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. The protein encoded by this gene is the major intestinal enzyme and functions in intestine drug clearance. Alternatively spliced transcript variants have been found for this gene. 8824 CES2 carboxylesterase 2 ENSG00000172831
The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This protein functions in the cellular export of its substrate, cyclic nucleotides. This export contributes to the degradation of phosphodiesterases and possibly an elimination pathway for cyclic nucleotides. Studies show that this protein provides resistance to thiopurine anticancer drugs, 6-mercatopurine and thioguanine, and the anti-HIV drug 9-(2-phosphonylmethoxyethyl)adenine. This protein may be involved in resistance to thiopurines in acute lymphoblastic leukemia and antiretroviral nucleoside analogs in HIV-infected patients. Alternative splicing results in multiple transcript variants. 10057 ABCC5 ATP binding cassette subfamily C member 5 ENSG00000114770
This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. 634 CEACAM1 carcinoembryonic antigen related cell adhesion molecule 1 ENSG00000079385
This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. 117159 DCD dermcidin ENSG00000161634
This gene is a member of the NAD(P)H dehydrogenase (quinone) family and encodes a cytoplasmic 2-electron reductase. This FAD-binding protein forms homodimers and reduces quinones to hydroquinones. This protein’s enzymatic activity prevents the one electron reduction of quinones that results in the production of radical species. Mutations in this gene have been associated with tardive dyskinesia (TD), an increased risk of hematotoxicity after exposure to benzene, and susceptibility to various forms of cancer. Altered expression of this protein has been seen in many tumors and is also associated with Alzheimer’s disease (AD). Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 1728 NQO1 NAD(P)H quinone dehydrogenase 1 ENSG00000181019
This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. 3699 ITIH3 inter-alpha-trypsin inhibitor heavy chain 3 ENSG00000162267
Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 ALB albumin ENSG00000163631
The protein encoded by this gene acts as a homodimer and is involved in many redox reactions. The encoded protein is active in the reversible S-nitrosylation of cysteines in certain proteins, which is part of the response to intracellular nitric oxide. This protein is found in the cytoplasm. Two transcript variants encoding different isoforms have been found for this gene. 7295 TXN thioredoxin ENSG00000136810
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region of chromosome 17q12-q21. This keratin has been coexpressed with keratin 14 in a number of epithelial tissues, including esophagus, tongue, and hair follicles. Mutations in this gene are associated with type 1 pachyonychia congenita, non-epidermolytic palmoplantar keratoderma and unilateral palmoplantar verrucous nevus. 3868 KRT16 keratin 16 ENSG00000186832
The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. 57111 RAB25 RAB25, member RAS oncogene family ENSG00000132698
This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 VIM vimentin ENSG00000026025
This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. 388533 KRTDAP keratinocyte differentiation associated protein ENSG00000188508
The calpains, calcium-activated neutral proteases, are nonlysosomal, intracellular cysteine proteases. The mammalian calpains include ubiquitous, stomach-specific, and muscle-specific proteins. The ubiquitous enzymes consist of heterodimers with distinct large, catalytic subunits associated with a common small, regulatory subunit. This gene encodes the large subunit of the ubiquitous enzyme, calpain 2. Multiple heterogeneous transcriptional start sites in the 5’ UTR have been reported. Two transcript variants encoding different isoforms have been found for this gene. 824 CAPN2 calpain 2 ENSG00000162909
The calpains, calcium-activated neutral proteases, are nonlysosomal, intracellular cysteine proteases. The mammalian calpains include ubiquitous, stomach-specific, and muscle-specific proteins. The ubiquitous enzymes consist of heterodimers with distinct large, catalytic subunits associated with a common small, regulatory subunit. This gene encodes the large subunit of the ubiquitous enzyme, calpain 1. Several transcript variants encoding two different isoforms have been found for this gene. 823 CAPN1 calpain 1 ENSG00000014216
This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and sheep orthologs. The encoded protein interacts with IRS1 protein, suggesting a role in regulating insulin sensitivity. Several transcript variants that differ in the 5’ UTR but that encode the same protein have been identified for this gene. 7534 YWHAZ tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein zeta ENSG00000164924
NA 54055 CYP4F29P cytochrome P450 family 4 subfamily F member 29, pseudogene ENSG00000228314
This gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney, and stomach. This gene is up-regulated in a large proportion of prostate cancers and is also detected in cancers of the bladder and pancreas. This gene includes a polymorphism that results in an upstream start codon in some individuals; this polymorphism is thought to be associated with a risk for certain gastric and bladder cancers. Alternative splicing results in multiple transcript variants. 8000 PSCA prostate stem cell antigen ENSG00000167653
The protein encoded by this gene is similar to oxidoreductases, which are enzymes involved in cellular responses to oxidative stresses and irradiation. This gene is induced by the tumor suppressor p53 and is thought to be involved in p53-mediated cell death. It contains a p53 consensus binding site in its promoter region and a downstream pentanucleotide microsatellite sequence. P53 has been shown to transcriptionally activate this gene by interacting with the downstream pentanucleotide microsatellite sequence. The microsatellite is polymorphic, with a varying number of pentanucleotide repeats directly correlated with the extent of transcriptional activation by p53. It has been suggested that the microsatellite polymorphism may be associated with differential susceptibility to cancer. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 9540 TP53I3 tumor protein p53 inducible protein 3 ENSG00000115129
Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. 9076 CLDN1 claudin 1 ENSG00000163347
This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. Differential expression of this gene has been observed in different types of malignancies, including breast cancer, ovarian cancer, hepatocellular carcinomas, urinary tumors, prostate cancer, lung cancer, head and neck cancers, thyroid carcinomas, etc.. Alternatively spliced transcript variants encoding different isoforms have been found. 1366 CLDN7 claudin 7 ENSG00000181885
This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions in the inhibition of phopholipase A2 and cleavage of inositol 1,2-cyclic phosphate to form inositol 1-phosphate. This protein may also play a role in anti-coagulation. 306 ANXA3 annexin A3 ENSG00000138772
This gene encodes a sterile alpha motif domain-containing protein. The encoded protein localizes to the cytoplasm and may play a role in regulating cell proliferation and apoptosis. Mutations in this gene are the cause of normophosphatemic familial tumoral calcinosis. Alternate splicing results in multiple transcript variants that encode the same protein. 54809 SAMD9 sterile alpha motif domain containing 9 ENSG00000205413
This gene encodes a member of the Kruppel-like factor subfamily of zinc finger proteins. The encoded protein is a transcriptional activator that binds directly to a specific recognition motif in the promoters of target genes. This protein acts downstream of multiple different signaling pathways and is regulated by post-translational modification. It may participate in both promoting and suppressing cell proliferation. Expression of this gene may be changed in a variety of different cancers and in cardiovascular disease. Alternative splicing results in multiple transcript variants. 688 KLF5 Kruppel like factor 5 ENSG00000102554
NA 147645 VSIG10L V-set and immunoglobulin domain containing 10 like ENSG00000186806
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
X_id symbol summary query name
72 ACTG2 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ENSG00000163017 actin, gamma 2, smooth muscle, enteric
1832 DSP This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. ENSG00000096696 desmoplakin
64065 PERP NA ENSG00000112378 PERP, TP53 apoptosis effector
3855 KRT7 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. ENSG00000135480 keratin 7
4625 MYH7 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. ENSG00000092054 myosin, heavy chain 7, cardiac muscle, beta
84525 HOPX The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. ENSG00000171476 HOP homeobox
125 ADH1B The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000196616 alcohol dehydrogenase 1B (class I), beta polypeptide
1465 CSRP1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. ENSG00000159176 cysteine and glycine rich protein 1
59 ACTA2 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000107796 actin, alpha 2, smooth muscle, aorta
23650 TRIM29 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. ENSG00000137699 tripartite motif containing 29
11187 PKP3 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may act in cellular desmosome-dependent adhesion and signaling pathways. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000184363 plakophilin 3
3860 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. ENSG00000171401 keratin 13
53905 DUOX1 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. ENSG00000137857 dual oxidase 1
960 CD44 The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. ENSG00000026508 CD44 molecule (Indian blood group)
54869 EPS8L1 This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000131037 EPS8 like 1
6288 SAA1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. ENSG00000173432 serum amyloid A1
5317 PKP1 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000081277 plakophilin 1
2752 GLUL The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. ENSG00000135821 glutamate-ammonia ligase
9289 ADGRG1 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. ENSG00000205336 adhesion G protein-coupled receptor G1
6319 SCD This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. ENSG00000099194 stearoyl-CoA desaturase
25946 ZNF385A Zinc finger proteins, such as ZNF385A, are regulatory proteins that act as transcription factors, bind single- or double-stranded RNA, or interact with other proteins (Sharma et al., 2004 [PubMed 15527981]). ENSG00000161642 zinc finger protein 385A
5265 SERPINA1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. ENSG00000197249 serpin family A member 1
374897 SBSN NA ENSG00000189001 suprabasin
93099 DMKN This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. ENSG00000161249 dermokine
57111 RAB25 The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. ENSG00000132698 RAB25, member RAS oncogene family
7038 TG Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. ENSG00000042832 thyroglobulin
171024 SYNPO2 NA ENSG00000172403 synaptopodin 2
2261 FGFR3 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. ENSG00000068078 fibroblast growth factor receptor 3
5617 PRL This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. ENSG00000172179 prolactin
1158 CKM The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. ENSG00000104879 creatine kinase, M-type
1843 DUSP1 The expression of DUSP1 gene is induced in human skin fibroblasts by oxidative/heat stress and growth factors. It specifies a protein with structural features similar to members of the non-receptor-type protein-tyrosine phosphatase family, and which has significant amino-acid sequence similarity to a Tyr/Ser-protein phosphatase encoded by the late gene H1 of vaccinia virus. The bacterially expressed and purified DUSP1 protein has intrinsic phosphatase activity, and specifically inactivates mitogen-activated protein (MAP) kinase in vitro by the concomitant dephosphorylation of both its phosphothreonine and phosphotyrosine residues. Furthermore, it suppresses the activation of MAP kinase by oncogenic ras in extracts of Xenopus oocytes. Thus, DUSP1 may play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. ENSG00000120129 dual specificity phosphatase 1
4070 TACSTD2 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. ENSG00000184292 tumor-associated calcium signal transducer 2
2597 GAPDH This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase
213 ALB Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ENSG00000163631 albumin
3309 HSPA5 The protein encoded by this gene is a member of the heat shock protein 70 (HSP70) family. It is localized in the lumen of the endoplasmic reticulum (ER), and is involved in the folding and assembly of proteins in the ER. As this protein interacts with many ER proteins, it may play a key role in monitoring protein transport through the cell. ENSG00000044574 heat shock protein family A (Hsp70) member 5
1401 CRP The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. ENSG00000132693 C-reactive protein, pentraxin-related
7169 TPM2 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000198467 tropomyosin 2 (beta)
2243 FGA This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. ENSG00000171560 fibrinogen alpha chain
3880 KRT19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. ENSG00000171345 keratin 19
7094 TLN1 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. ENSG00000137076 talin 1
3848 KRT1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1
2697 GJA1 This gene is a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. The encoded protein is the major protein of gap junctions in the heart that are thought to have a crucial role in the synchronized contraction of the heart and in embryonic development. A related intronless pseudogene has been mapped to chromosome 5. Mutations in this gene have been associated with oculodentodigital dysplasia, autosomal recessive craniometaphyseal dysplasia and heart malformations. ENSG00000152661 gap junction protein alpha 1
83959 SLC4A11 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. ENSG00000088836 solute carrier family 4 member 11
3960 LGALS4 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. ENSG00000171747 galectin 4
4359 MPZ This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. ENSG00000158887 myelin protein zero
476 ATP1A1 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 1 subunit. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000163399 ATPase Na+/K+ transporting subunit alpha 1
7056 THBD The protein encoded by this intronless gene is an endothelial-specific type I membrane receptor that binds thrombin. This binding results in the activation of protein C, which degrades clotting factors Va and VIIIa and reduces the amount of thrombin generated. Mutations in this gene are a cause of thromboembolic disease, also known as inherited thrombophilia. ENSG00000178726 thrombomodulin
70 ACTC1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ENSG00000159251 actin, alpha, cardiac muscle 1
388533 KRTDAP This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000188508 keratinocyte differentiation associated protein
10653 SPINT2 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000167642 serine peptidase inhibitor, Kunitz type, 2
2335 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1
6285 S100B The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. ENSG00000160307 S100 calcium binding protein B
8428 STK24 This gene encodes a serine/threonine protein kinase that functions upstream of mitogen-activated protein kinase (MAPK) signaling. The encoded protein is cleaved into two chains by caspases; the N-terminal fragment (MST3/N) translocates to the nucleus and promotes programmed cells death. There is a pseudogene for this gene on chromosome X. Alternative splicing results in multiple transcript variants. ENSG00000102572 serine/threonine kinase 24
ENSG00000225630 MTND2P28 NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28
9314 KLF4 This gene encodes a protein that belongs to the Kruppel family of transcription factors. The encoded zinc finger protein is required for normal development of the barrier function of skin. The encoded protein is thought to control the G1-to-S transition of the cell cycle following DNA damage by mediating the tumor suppressor gene p53. Mice lacking this gene have a normal appearance but lose weight rapidly, and die shortly after birth due to fluid evaporation resulting from compromised epidermal barrier function. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000136826 Kruppel like factor 4
2688 GH1 The protein encoded by this gene is a member of the somatotropin/prolactin family of hormones which play an important role in growth control. The gene, along with four other related genes, is located at the growth hormone locus on chromosome 17 where they are interspersed in the same transcriptional orientation; an arrangement which is thought to have evolved by a series of gene duplications. The five genes share a remarkably high degree of sequence identity. Alternative splicing generates additional isoforms of each of the five growth hormones, leading to further diversity and potential for specialization. This particular family member is expressed in the pituitary but not in placental tissue as is the case for the other four genes in the growth hormone locus. Mutations in or deletions of the gene lead to growth hormone deficiency and short stature. ENSG00000259384 growth hormone 1
360 AQP3 This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. ENSG00000165272 aquaporin 3 (Gill blood group)
54739 XAF1 This gene encodes a protein which binds to and counteracts the inhibitory effect of a member of the IAP (inhibitor of apoptosis) protein family. IAP proteins bind to and inhibit caspases which are activated during apoptosis. The proportion of IAPs and proteins which interfere with their activity, such as the encoded protein, affect the progress of the apoptosis signaling pathway. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000132530 XIAP associated factor 1
1191 CLU The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. ENSG00000120885 clusterin
10135 NAMPT This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. ENSG00000105835 nicotinamide phosphoribosyltransferase
220323 OAF NA ENSG00000184232 out at first homolog
50649 ARHGEF4 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The protein encoded by this gene may form complex with G proteins and stimulate Rho-dependent signals. Multiple alternatively spliced transcript variants encoding different isoforms have been found, but the full-length nature of some variants has not been determined. ENSG00000136002 Rho guanine nucleotide exchange factor 4
2266 FGG The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. ENSG00000171557 fibrinogen gamma chain
5919 RARRES2 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. ENSG00000106538 retinoic acid receptor responder 2
7534 YWHAZ This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and sheep orthologs. The encoded protein interacts with IRS1 protein, suggesting a role in regulating insulin sensitivity. Several transcript variants that differ in the 5’ UTR but that encode the same protein have been identified for this gene. ENSG00000164924 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein zeta
1292 COL6A2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2
134147 CMBL CMBL (EC 3.1.1.45) is a cysteine hydrolase of the dienelactone hydrolase family that is highly expressed in liver cytosol. CMBL preferentially cleaves cyclic esters, and it activates medoxomil-ester prodrugs in which the medoxomil moiety is linked to an oxygen atom (Ishizuka et al., 2010 [PubMed 20177059]). ENSG00000164237 carboxymethylenebutenolidase homolog (Pseudomonas)
5004 ORM1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. ENSG00000229314 orosomucoid 1
57402 S100A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). ENSG00000189334 S100 calcium binding protein A14
488 ATP2A2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000174437 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2
ENSG00000180139 ACTA2-AS1 NA ENSG00000180139 ACTA2 antisense RNA 1
3512 JCHAIN NA ENSG00000132465 joining chain of multimeric IgA and IgM
9620 CELSR1 The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. This particular member is a developmentally regulated, neural-specific gene which plays an unspecified role in early embryogenesis. ENSG00000075275 cadherin EGF LAG seven-pass G-type receptor 1
10057 ABCC5 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This protein functions in the cellular export of its substrate, cyclic nucleotides. This export contributes to the degradation of phosphodiesterases and possibly an elimination pathway for cyclic nucleotides. Studies show that this protein provides resistance to thiopurine anticancer drugs, 6-mercatopurine and thioguanine, and the anti-HIV drug 9-(2-phosphonylmethoxyethyl)adenine. This protein may be involved in resistance to thiopurines in acute lymphoblastic leukemia and antiretroviral nucleoside analogs in HIV-infected patients. Alternative splicing results in multiple transcript variants. ENSG00000114770 ATP binding cassette subfamily C member 5
5950 RBP4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. ENSG00000138207 retinol binding protein 4
149428 BNIPL The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000163141 BCL2/adenovirus E1B 19kD interacting protein like
2244 FGB The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000171564 fibrinogen beta chain
1360 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. ENSG00000153002 carboxypeptidase B1
25900 IFFO1 This gene is a member of the intermediate filament family. Intermediate filaments are proteins which are primordial components of the cytoskeleton and nuclear envelope. The proteins encoded by the members of this gene family are evolutionarily and structurally related but have limited sequence homology, with the exception of the central rod domain. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000010295 intermediate filament family orphan 1
23344 ESYT1 NA ENSG00000139641 extended synaptotagmin protein 1
58498 MYL7 NA ENSG00000106631 myosin light chain 7
29842 TFCP2L1 NA ENSG00000115112 transcription factor CP2-like 1
ENSG00000261054 RP11-6O2.4 NA ENSG00000261054 NA
2052 EPHX1 Epoxide hydrolase is a critical biotransformation enzyme that converts epoxides from the degradation of aromatic compounds to trans-dihydrodiols which can be conjugated and excreted from the body. Epoxide hydrolase functions in both the activation and detoxification of epoxides. Mutations in this gene cause preeclampsia, epoxide hydrolase deficiency or increased epoxide hydrolase activity. Alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000143819 epoxide hydrolase 1
4629 MYH11 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000133392 myosin, heavy chain 11, smooth muscle
ENSG00000229732 AC019349.5 NA ENSG00000229732 NA
8766 RAB11A The protein encoded by this gene belongs to the Rab family of the small GTPase superfamily. It is associated with both constitutive and regulated secretory pathways, and may be involved in protein transport. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000103769 RAB11A, member RAS oncogene family
84649 DGAT2 This gene encodes one of two enzymes which catalyzes the final reaction in the synthesis of triglycerides in which diacylglycerol is covalently bound to long chain fatty acyl-CoAs. The encoded protein catalyzes this reaction at low concentrations of magnesium chloride while the other enzyme has high activity at high concentrations of magnesium chloride. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000062282 diacylglycerol O-acyltransferase 2
229 ALDOB Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. ENSG00000136872 aldolase, fructose-bisphosphate B
7448 VTN The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. ENSG00000109072 vitronectin
100528017 SAA2-SAA4 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. ENSG00000255071 SAA2-SAA4 readthrough
5208 PFKFB2 The protein encoded by this gene is involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate, and a fructose-2,6-biphosphatase activity that catalyzes the degradation of fructose-2,6-bisphosphate. This protein regulates fructose-2,6-bisphosphate levels in the heart, while a related enzyme encoded by a different gene regulates fructose-2,6-bisphosphate levels in the liver and muscle. This enzyme functions as a homodimer. Two transcript variants encoding two different isoforms have been found for this gene. ENSG00000123836 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2
1264 CNN1 NA ENSG00000130176 calponin 1
1281 COL3A1 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000168542 collagen type III alpha 1 chain
23022 PALLD This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. ENSG00000129116 palladin, cytoskeletal associated protein
7538 ZFP36 NA ENSG00000128016 ZFP36 ring finger protein
3849 KRT2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000172867 keratin 2
5187 PER1 This gene is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain. Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism, and behavior. This gene is upregulated by CLOCK/ARNTL heterodimers but then represses this upregulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/ARNTL. Polymorphisms in this gene may increase the risk of getting certain cancers. Alternative splicing has been observed in this gene; however, these variants have not been fully described. ENSG00000179094 period circadian clock 1
7173 TPO This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. ENSG00000115705 thyroid peroxidase
4311 MME This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. ENSG00000196549 membrane metallo-endopeptidase
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol notfound
desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 ENSG00000175084 DES NA
myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 ENSG00000133392 MYH11 NA
fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 ENSG00000115414 FN1 NA
albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 ENSG00000163631 ALB NA
actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ENSG00000075624 ACTB NA
myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 ENSG00000101335 MYL9 NA
serpin family A member 1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 ENSG00000197249 SERPINA1 NA
gelsolin The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. 2934 ENSG00000148180 GSN NA
fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. 2243 ENSG00000171560 FGA NA
fibrinogen beta chain The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2244 ENSG00000171564 FGB NA
vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 ENSG00000026025 VIM NA
NA NA NA ENSG00000259716 NA TRUE
cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. 1465 ENSG00000159176 CSRP1 NA
myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 ENSG00000198125 MB NA
myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7 NA
fibrinogen gamma chain The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. 2266 ENSG00000171557 FGG NA
orosomucoid 1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. 5004 ENSG00000229314 ORM1 NA
myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. 4638 ENSG00000065534 MYLK NA
thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 ENSG00000042832 TG NA
myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 ENSG00000111245 MYL2 NA
fibulin 2 This gene encodes an extracellular matrix protein, which belongs to the fibulin family. This protein binds various extracellular ligands and calcium. It may play a role during organ development, in particular, during the differentiation of heart, skeletal and neuronal structures. Alternatively spliced transcript variants encoding different isoforms have been identified. 2199 ENSG00000163520 FBLN2 NA
C-reactive protein, pentraxin-related The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. 1401 ENSG00000132693 CRP NA
coiled-coil domain containing 80 NA 151887 ENSG00000091986 CCDC80 NA
lipoprotein lipase LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. 4023 ENSG00000175445 LPL NA
actinin alpha 1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. 87 ENSG00000072110 ACTN1 NA
annexin A2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 302 ENSG00000182718 ANXA2 NA
tensin 1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. 7145 ENSG00000079308 TNS1 NA
cytochrome P450 family 2 subfamily E member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is induced by ethanol, the diabetic state, and starvation. The enzyme metabolizes both endogenous substrates, such as ethanol, acetone, and acetal, as well as exogenous substrates including benzene, carbon tetrachloride, ethylene glycol, and nitrosamines which are premutagens found in cigarette smoke. Due to its many substrates, this enzyme may be involved in such varied processes as gluconeogenesis, hepatic cirrhosis, diabetes, and cancer. 1571 ENSG00000130649 CYP2E1 NA
ankyrin repeat domain 1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ENSG00000148677 ANKRD1 NA
integral membrane protein 2B Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. 9445 ENSG00000136156 ITM2B NA
apolipoprotein C3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. 345 ENSG00000110245 APOC3 NA
collagen type I alpha 2 chain This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 ENSG00000164692 COL1A2 NA
beta-2-microglobulin This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. 567 ENSG00000166710 B2M NA
myosin light chain 3 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. 4634 ENSG00000160808 MYL3 NA
crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 ENSG00000109846 CRYAB NA
prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 ENSG00000107317 PTGDS NA
apolipoprotein H Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. 350 ENSG00000091583 APOH NA
collagen type VI alpha 3 chain This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 ENSG00000163359 COL6A3 NA
epithelial membrane protein 1 NA 2012 ENSG00000134531 EMP1 NA
neuronal calcium sensor 1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. 23413 ENSG00000107130 NCS1 NA
tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 ENSG00000198467 TPM2 NA
alpha-1-microglobulin/bikunin precursor This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. 259 ENSG00000106927 AMBP NA
troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 ENSG00000114854 TNNC1 NA
integrin subunit alpha 5 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. 3678 ENSG00000161638 ITGA5 NA
hemopexin This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. 3263 ENSG00000110169 HPX NA
actinin alpha 2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88 ENSG00000077522 ACTN2 NA
myosin light chain 12A This gene encodes a nonsarcomeric myosin regulatory light chain. This protein is activated by phosphorylation and regulates smooth muscle and non-muscle cell contraction. This protein may also be involved in DNA damage repair by sequestering the transcriptional regulator apoptosis-antagonizing transcription factor (AATF)/Che-1 which functions as a repressor of p53-driven apoptosis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 8. 10627 ENSG00000101608 MYL12A NA
carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 ENSG00000091704 CPA1 NA
titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 ENSG00000155657 TTN NA
microfibrillar associated protein 5 This gene encodes a 25-kD microfibril-associated glycoprotein which is a component of microfibrils of the extracellular matrix. The encoded protein promotes attachment of cells to microfibrils via alpha-V-beta-3 integrin. Deficiency of this gene in mice results in neutropenia. Alternate splicing results in multiple transcript variants encoding different isoforms. 8076 ENSG00000197614 MFAP5 NA
NA NA NA ENSG00000272761 NA TRUE
calponin 1 NA 1264 ENSG00000130176 CNN1 NA
hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 ENSG00000244734 HBB NA
endoglin This gene encodes a homodimeric transmembrane protein which is a major glycoprotein of the vascular endothelium. This protein is a component of the transforming growth factor beta receptor complex and it binds to the beta1 and beta3 peptides with high affinity. Mutations in this gene cause hereditary hemorrhagic telangiectasia, also known as Osler-Rendu-Weber syndrome 1, an autosomal dominant multisystemic vascular dysplasia. This gene may also be involved in preeclampsia and several types of cancer. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2022 ENSG00000106991 ENG NA
myosin binding protein C, cardiac MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607 ENSG00000134571 MYBPC3 NA
spectrin beta, non-erythrocytic 1 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. 6711 ENSG00000115306 SPTBN1 NA
lysyl oxidase This gene encodes a member of the lysyl oxidase family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate a regulatory propeptide and the mature enzyme. The copper-dependent amine oxidase activity of this enzyme functions in the crosslinking of collagens and elastin, while the propeptide may play a role in tumor suppression. 4015 ENSG00000113083 LOX NA
cysteine and glycine rich protein 3 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. 8048 ENSG00000129170 CSRP3 NA
collagen type XII alpha 1 chain This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. 1303 ENSG00000111799 COL12A1 NA
apolipoprotein A1 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. 335 ENSG00000118137 APOA1 NA
destrin, actin depolymerizing factor The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. 11034 ENSG00000125868 DSTN NA
5’-nucleotidase domain containing 3 NA 51559 ENSG00000111696 NT5DC3 NA
PBX homeobox interacting protein 1 The protein encoded by this gene interacts with the PBX1 homeodomain protein, inhibiting its transcriptional activation potential by preventing its binding to DNA. The encoded protein, which is primarily cytosolic but can shuttle to the nucleus, also can interact with estrogen receptors alpha and beta and promote the proliferation of breast cancer, brain tumors, and lung cancer. Several transcript variants encoding different isoforms have been found for this gene. More variants exist, but their full-length natures have yet to be determined. 57326 ENSG00000163346 PBXIP1 NA
ribosomal protein lateral stalk subunit P0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6175 ENSG00000089157 RPLP0 NA
annexin A5 The protein encoded by this gene belongs to the annexin family of calcium-dependent phospholipid binding proteins some of which have been implicated in membrane-related events along exocytotic and endocytotic pathways. Annexin 5 is a phospholipase A2 and protein kinase C inhibitory protein with calcium channel activity and a potential role in cellular signal transduction, inflammation, growth and differentiation. Annexin 5 has also been described as placental anticoagulant protein I, vascular anticoagulant-alpha, endonexin II, lipocortin V, placental protein 4 and anchorin CII. The gene spans 29 kb containing 13 exons, and encodes a single transcript of approximately 1.6 kb and a protein product with a molecular weight of about 35 kDa. 308 ENSG00000164111 ANXA5 NA
glycoprotein nmb The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. 10457 ENSG00000136235 GPNMB NA
RNA binding protein with multiple splicing This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. 11030 ENSG00000157110 RBPMS NA
protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 ENSG00000204983 PRSS1 NA
laminin subunit beta 1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. 3912 ENSG00000091136 LAMB1 NA
proline rich coiled-coil 2A A cluster of genes, BAT1-BAT5, has been localized in the vicinity of the genes for TNF alpha and TNF beta. These genes are all within the human major histocompatibility complex class III region. This gene has microsatellite repeats which are associated with the age-at-onset of insulin-dependent diabetes mellitus (IDDM) and possibly thought to be involved with the inflammatory process of pancreatic beta-cell destruction during the development of IDDM. This gene is also a candidate gene for the development of rheumatoid arthritis. Two transcript variants encoding the same protein have been found for this gene. 7916 ENSG00000204469 PRRC2A NA
matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313 ENSG00000087245 MMP2 NA
nebulin related anchoring protein NA 4892 ENSG00000197893 NRAP NA
apolipoprotein A2 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. 336 ENSG00000158874 APOA2 NA
potassium voltage-gated channel interacting protein 2 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. 30819 ENSG00000120049 KCNIP2 NA
keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 ENSG00000170477 KRT4 NA
myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6 NA
peptidylglycine alpha-amidating monooxygenase This gene encodes a multifunctional protein. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme includes two domains with distinct catalytic activities, a peptidylglycine alpha-hydroxylating monooxygenase (PHM) domain and a peptidyl-alpha-hydroxyglycine alpha-amidating lyase (PAL) domain. These catalytic domains work sequentially to catalyze the conversion of neuroendocrine peptides to active alpha-amidated products. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 5066 ENSG00000145730 PAM NA
synaptic Ras GTPase activating protein 1 The protein encoded by this gene is a major component of the postsynaptic density (PSD), a group of proteins found associated with NMDA receptors at synapses. The encoded protein is phosphorylated by calmodulin-dependent protein kinase II and dephosphorylated by NMDA receptor activation. Defects in this gene are a cause of mental retardation autosomal dominant type 5 (MRD5). 8831 ENSG00000197283 SYNGAP1 NA
titin-cap Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557 ENSG00000173991 TCAP NA
cathepsin K The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. 1513 ENSG00000143387 CTSK NA
galectin 1 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. This gene product may act as an autocrine negative growth factor that regulates cell proliferation. 3956 ENSG00000100097 LGALS1 NA
apolipoprotein E The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. 348 ENSG00000130203 APOE NA
aldolase, fructose-bisphosphate B Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. 229 ENSG00000136872 ALDOB NA
ubiquitin protein ligase E3 component n-recognin 4 The protein encoded by this gene is an E3 ubiquitin-protein ligase that interacts with the retinoblastoma-associated protein in the nucleus and with calcium-bound calmodulin in the cytoplasm. The encoded protein appears to be a cytoskeletal component in the cytoplasm and part of the chromatin scaffold in the nucleus. In addition, this protein is a target of the human papillomavirus type 16 E7 oncoprotein. 23352 ENSG00000127481 UBR4 NA
4-hydroxyphenylpyruvate dioxygenase The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. 3242 ENSG00000158104 HPD NA
latent transforming growth factor beta binding protein 2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. 4053 ENSG00000119681 LTBP2 NA
NA NA ENSG00000234961 ENSG00000234961 RP11-124N14.3 NA
chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 ENSG00000142789 CELA3A NA
ACTA2 antisense RNA 1 NA ENSG00000180139 ENSG00000180139 ACTA2-AS1 NA
troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 ENSG00000118194 TNNT2 NA
actinin alpha 4 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. 81 ENSG00000130402 ACTN4 NA
elastin This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. 2006 ENSG00000049540 ELN NA
transgelin 2 The protein encoded by this gene is similar to the protein transgelin, which is one of the earliest markers of differentiated smooth muscle. The specific function of this protein has not yet been determined, although it is thought to be a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene. 8407 ENSG00000158710 TAGLN2 NA
leiomodin 1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. 25802 ENSG00000163431 LMOD1 NA
actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ENSG00000099204 ABLIM1 NA
actin, alpha 2, smooth muscle, aorta The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 ENSG00000107796 ACTA2 NA
actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ENSG00000143632 ACTA1 NA
TIMP metallopeptidase inhibitor 2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. 7077 ENSG00000035862 TIMP2 NA
kinesin family member 5A This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. 3798 ENSG00000155980 KIF5A NA
nuclear receptor subfamily 4 group A member 1 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. 3164 ENSG00000123358 NR4A1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
query name X_id symbol summary
ENSG00000133112 tumor protein, translationally-controlled 1 7178 TPT1 NA
ENSG00000156508 eukaryotic translation elongation factor 1 alpha 1 1915 EEF1A1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes.
ENSG00000237973 MT-CO1 pseudogene 12 ENSG00000237973 MTCO1P12 NA
ENSG00000100316 ribosomal protein L3 6122 RPL3 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000142541 ribosomal protein L13a 23521 RPL13A Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L13P family of ribosomal proteins that is a component of the 60S subunit. The encoded protein also plays a role in the repression of inflammatory genes as a component of the IFN-gamma-activated inhibitor of translation (GAIT) complex. This gene is co-transcribed with the small nucleolar RNA genes U32, U33, U34, and U35, which are located in the second, fourth, fifth, and sixth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
ENSG00000115461 insulin like growth factor binding protein 5 3488 IGFBP5 NA
ENSG00000142937 ribosomal protein S8 6202 RPS8 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000137154 ribosomal protein S6 6194 RPS6 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000167526 ribosomal protein L13 6137 RPL13 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L13E family of ribosomal proteins. It is located in the cytoplasm. This gene is expressed at significantly higher levels in benign breast lesions than in breast carcinomas. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000149273 ribosomal protein S3 6188 RPS3 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene.
ENSG00000142676 ribosomal protein L11 6135 RPL11 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L5P family of ribosomal proteins. It is located in the cytoplasm. The protein probably associates with the 5S rRNA. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000140988 ribosomal protein S2 6187 RPS2 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S5P family of ribosomal proteins. It is located in the cytoplasm. This gene shares sequence similarity with mouse LLRep3. It is co-transcribed with the small nucleolar RNA gene U64, which is located in its third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000170889 ribosomal protein S9 6203 RPS9 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S4P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, multiple processed pseudogenes derived from this gene are dispersed through the genome.
ENSG00000148303 ribosomal protein L7a 6130 RPL7A Cytoplasmic ribosomes, organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L7AE family of ribosomal proteins. It can interact with a subclass of nuclear hormone receptors, including thyroid hormone receptor, and inhibit their ability to transactivate by preventing their binding to their DNA response elements. This gene is included in the surfeit gene cluster, a group of very tightly linked genes that do not share sequence similarity. It is co-transcribed with the U24, U36a, U36b, and U36c small nucleolar RNA genes, which are located in its second, fifth, fourth, and sixth introns, respectively. This gene rearranges with the trk proto-oncogene to form the chimeric oncogene trk-2h, which encodes an oncoprotein consisting of the N terminus of ribosomal protein L7a fused to the receptor tyrosine kinase domain of trk. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000125691 ribosomal protein L23 9349 RPL23 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L14P family of ribosomal proteins. It is located in the cytoplasm. This gene has been referred to as rpL17 because the encoded protein shares amino acid identity with ribosomal protein L17 from Saccharomyces cerevisiae; however, its official symbol is RPL23. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000071082 ribosomal protein L31 6160 RPL31 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L31E family of ribosomal proteins. It is located in the cytoplasm. Higher levels of expression of this gene in familial adenomatous polyps compared to matched normal tissues have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene.
ENSG00000177600 ribosomal protein lateral stalk subunit P2 6181 RPLP2 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P1. The P2 protein can interact with P0 and P1 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000089157 ribosomal protein lateral stalk subunit P0 6175 RPLP0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000142534 ribosomal protein S11 6205 RPS11 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the S17P family of ribosomal proteins that is a component of the 40S subunit. This gene is co-transcribed with the small nucleolar RNA gene U35B, which is located in the third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome.
ENSG00000236552 ribosomal protein L13a pseudogene 5 728658 RPL13AP5 NA
ENSG00000108298 ribosomal protein L19 6143 RPL19 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L19E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000175084 desmin 1674 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies.
ENSG00000108107 ribosomal protein L28 6158 RPL28 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L28E family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternative splicing results in multiple transcript variants encoding distinct isoforms.
ENSG00000112306 ribosomal protein S12 6206 RPS12 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S12E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal cancers compared to matched normal colonic mucosa has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000197958 ribosomal protein L12 6136 RPL12 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L11P family of ribosomal proteins. It is located in the cytoplasm. The protein binds directly to the 26S rRNA. This gene is co-transcribed with the U65 snoRNA, which is located in its fourth intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000161016 ribosomal protein L8 6132 RPL8 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L2P family of ribosomal proteins. It is located in the cytoplasm. In rat, the protein associates with the 5.8S rRNA, very likely participates in the binding of aminoacyl-tRNA, and is a constituent of the elongation factor 2-binding site at the ribosomal subunit interface. Alternatively spliced transcript variants encoding the same protein exist. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000231500 ribosomal protein S18 6222 RPS18 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S13P family of ribosomal proteins. It is located in the cytoplasm. The gene product of the E. coli ortholog (ribosomal protein S13) is involved in the binding of fMet-tRNA, and thus, in the initiation of translation. This gene is an ortholog of mouse Ke3. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000138326 ribosomal protein S24 6229 RPS24 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S24E family of ribosomal proteins. It is located in the cytoplasm. Multiple transcript variants encoding different isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Mutations in this gene result in Diamond-Blackfan anemia.
ENSG00000008988 ribosomal protein S20 6224 RPS20 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S10P family of ribosomal proteins. It is located in the cytoplasm. This gene is co-transcribed with the small nucleolar RNA gene U54, which is located in its second intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Two transcript variants encoding different isoforms have been identified for this gene.
ENSG00000204628 receptor for activated C kinase 1 10399 RACK1 NA
ENSG00000105193 ribosomal protein S16 6217 RPS16 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S9P family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000133392 myosin, heavy chain 11, smooth muscle 4629 MYH11 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified.
ENSG00000197756 ribosomal protein L37a 6168 RPL37A Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L37AE family of ribosomal proteins. It is located in the cytoplasm. The protein contains a C4-type zinc finger-like domain. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000144713 ribosomal protein L32 6161 RPL32 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L32E family of ribosomal proteins. It is located in the cytoplasm. Although some studies have mapped this gene to 3q13.3-q21, it is believed to map to 3p25-p24. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding the same protein have been observed for this gene.
ENSG00000273149 NA ENSG00000273149 RP11-290D2.6 NA
ENSG00000145592 ribosomal protein L37 6167 RPL37 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L37E family of ribosomal proteins. It is located in the cytoplasm. The protein contains a C2C2-type zinc finger-like motif. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000063177 ribosomal protein L18 6141 RPL18 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L18E family of ribosomal proteins that is a component of the 60S subunit. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
ENSG00000164587 ribosomal protein S14 6208 RPS14 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S11P family of ribosomal proteins. It is located in the cytoplasm. Transcript variants utilizing alternative transcription initiation sites have been described in the literature. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. In Chinese hamster ovary cells, mutations in this gene can lead to resistance to emetine, a protein synthesis inhibitor. Multiple alternatively spliced transcript variants encoding the same protein have been found for this gene.
ENSG00000105373 glioma tumor suppressor candidate region gene 2 29997 GLTSCR2 NA
ENSG00000244398 NA ENSG00000244398 RP11-466H18.1 NA
ENSG00000232573 ribosomal protein L3 pseudogene 4 ENSG00000232573 RPL3P4 NA
ENSG00000171858 ribosomal protein S21 6227 RPS21 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S21E family of ribosomal proteins. It is located in the cytoplasm. Alternative splice variants that encode different protein isoforms have been described, but their existence has not been verified. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000143947 ribosomal protein S27a 6233 RPS27A Ubiquitin, a highly conserved protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome, is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin fused to an unrelated protein. This gene encodes a fusion protein consisting of ubiquitin at the N terminus and ribosomal protein S27a at the C terminus. When expressed in yeast, the protein is post-translationally processed, generating free ubiquitin monomer and ribosomal protein S27a. Ribosomal protein S27a is a component of the 40S subunit of the ribosome and belongs to the S27AE family of ribosomal proteins. It contains C4-type zinc finger domains and is located in the cytoplasm. Pseudogenes derived from this gene are present in the genome. As with ribosomal protein S27a, ribosomal protein L40 is also synthesized as a fusion protein with ubiquitin; similarly, ribosomal protein S30 is synthesized as a fusion protein with the ubiquitin-like protein fubi. Multiple alternatively spliced transcript variants that encode the same proteins have been identified.
ENSG00000198755 ribosomal protein L10a 4736 RPL10A Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L1P family of ribosomal proteins. It is located in the cytoplasm. The expression of this gene is downregulated in the thymus by cyclosporin-A (CsA), an immunosuppressive drug. Studies in mice have shown that the expression of the ribosomal protein L10a gene is downregulated in neural precursor cells during development. This gene previously was referred to as NEDD6 (neural precursor cell expressed, developmentally downregulated 6), but it has been renamed RPL10A (ribosomal protein 10a). As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000141753 insulin like growth factor binding protein 4 3487 IGFBP4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors.
ENSG00000196205 eukaryotic translation elongation factor 1 alpha 1 pseudogene 5 ENSG00000196205 EEF1A1P5 NA
ENSG00000109475 ribosomal protein L34 6164 RPL34 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L34E family of ribosomal proteins. It is located in the cytoplasm. This gene originally was thought to be located at 17q21, but it has been mapped to 4q. Overexpression of this gene has been observed in some cancer cells. Alternative splicing results in multiple transcript variants, all encoding the same isoform. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000153002 carboxypeptidase B1 1360 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma.
ENSG00000197971 myelin basic protein 4155 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes.
ENSG00000229344 MT-CO2 pseudogene 12 ENSG00000229344 MTCO2P12 NA
ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 ENSG00000225630 MTND2P28 NA
ENSG00000213442 ribosomal protein L18a pseudogene 3 ENSG00000213442 RPL18AP3 NA
ENSG00000227097 ribosomal protein S28 pseudogene 7 ENSG00000227097 RPS28P7 NA
ENSG00000168028 ribosomal protein SA 3921 RPSA Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Many of the effects of laminin are mediated through interactions with cell surface receptors. These receptors include members of the integrin family, as well as non-integrin laminin-binding proteins. This gene encodes a high-affinity, non-integrin family, laminin receptor 1. This receptor has been variously called 67 kD laminin receptor, 37 kD laminin receptor precursor (37LRP) and p40 ribosome-associated protein. The amino acid sequence of laminin receptor 1 is highly conserved through evolution, suggesting a key biological function. It has been observed that the level of the laminin receptor transcript is higher in colon carcinoma tissue and lung cancer cell line than their normal counterparts. Also, there is a correlation between the upregulation of this polypeptide in cancer cells and their invasive and metastatic phenotype. Multiple copies of this gene exist, however, most of them are pseudogenes thought to have arisen from retropositional events. Two alternatively spliced transcript variants encoding the same protein have been found for this gene.
ENSG00000188846 ribosomal protein L14 9045 RPL14 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L14E family of ribosomal proteins. It contains a basic region-leucine zipper (bZIP)-like domain. The protein is located in the cytoplasm. This gene contains a trinucleotide (GCT) repeat tract whose length is highly polymorphic; these triplet repeats result in a stretch of alanine residues in the encoded protein. Transcript variants utilizing alternative polyA signals and alternative 5’-terminal exons exist but all encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000166441 ribosomal protein L27a 6157 RPL27A Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L15P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, multiple processed pseudogenes derived from this gene are dispersed through the genome.
ENSG00000186468 ribosomal protein S23 6228 RPS23 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S12P family of ribosomal proteins. It is located in the cytoplasm. The protein shares significant amino acid similarity with S. cerevisiae ribosomal protein S28. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000162244 ribosomal protein L29 6159 RPL29 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 60S subunit. The protein belongs to the L29E family of ribosomal proteins. The protein is also a peripheral membrane protein expressed on the cell surface that directly binds heparin. Although this gene was previously reported to map to 3q29-qter, it is believed that it is located at 3p21.3-p21.2. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000155657 titin 7273 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma.
ENSG00000105372 ribosomal protein S19 6223 RPS19 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19E family of ribosomal proteins. It is located in the cytoplasm. Mutations in this gene cause Diamond-Blackfan anemia (DBA), a constitutional erythroblastopenia characterized by absent or decreased erythroid precursors, in a subset of patients. This suggests a possible extra-ribosomal function for this gene in erythropoietic differentiation and proliferation, in addition to its ribosomal function. Higher expression levels of this gene in some primary colon carcinomas compared to matched normal colon tissues has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000167996 ferritin heavy chain 1 2495 FTH1 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined.
ENSG00000156482 ribosomal protein L30 6156 RPL30 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L30E family of ribosomal proteins. It is located in the cytoplasm. This gene is co-transcribed with the U72 small nucleolar RNA gene, which is located in its fourth intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000178035 IMP (inosine 5’-monophosphate) dehydrogenase 2 3615 IMPDH2 This gene encodes the rate-limiting enzyme in the de novo guanine nucleotide biosynthesis. It is thus involved in maintaining cellular guanine deoxy- and ribonucleotide pools needed for DNA and RNA synthesis. The encoded protein catalyzes the NAD-dependent oxidation of inosine-5’-monophosphate into xanthine-5’-monophosphate, which is then converted into guanosine-5’-monophosphate. This gene is up-regulated in some neoplasms, suggesting it may play a role in malignant transformation.
ENSG00000083845 ribosomal protein S5 6193 RPS5 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S7P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000182899 ribosomal protein L35a 6165 RPL35A Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L35AE family of ribosomal proteins. It is located in the cytoplasm. The rat protein has been shown to bind to both initiator and elongator tRNAs, and thus, it is located at the P site, or P and A sites, of the ribosome. Although this gene was originally mapped to chromosome 18, it has been established that it is located at 3q29-qter. Alternative splicing results in multiple transcript variants. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000213741 ribosomal protein S29 6235 RPS29 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit and a member of the S14P family of ribosomal proteins. The protein, which contains a C2-C2 zinc finger-like domain that can bind to zinc, can enhance the tumor suppressor activity of Ras-related protein 1A (KREV1). It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding different isoforms have been found for this gene.
ENSG00000091704 carboxypeptidase A1 1357 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer.
ENSG00000170835 carboxyl ester lipase 1056 CEL The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein.
ENSG00000122406 ribosomal protein L5 6125 RPL5 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L18P family of ribosomal proteins. It is located in the cytoplasm. The protein binds 5S rRNA to form a stable complex called the 5S ribonucleoprotein particle (RNP), which is necessary for the transport of nonribosome-associated cytoplasmic 5S rRNA to the nucleolus for assembly into ribosomes. The protein interacts specifically with the beta subunit of casein kinase II. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. This gene is co-transcribed with the small nucleolar RNA gene U21, which is located in its fifth intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000184009 actin gamma 1 71 ACTG1 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants.
ENSG00000175061 LRRC75A antisense RNA 1 125144 LRRC75A-AS1 NA
ENSG00000115268 ribosomal protein S15 6209 RPS15 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19P family of ribosomal proteins. It is located in the cytoplasm. This gene has been found to be activated in various tumors, such as insulinomas, esophageal cancers, and colon cancers. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternative splicing results in multiple transcript variants.
ENSG00000114942 eukaryotic translation elongation factor 1 beta 2 1933 EEF1B2 This gene encodes a translation elongation factor. The protein is a guanine nucleotide exchange factor involved in the transfer of aminoacylated tRNAs to the ribosome. Alternative splicing results in three transcript variants which differ only in the 5’ UTR.
ENSG00000242071 ribosomal protein L7a pseudogene 6 ENSG00000242071 RPL7AP6 NA
ENSG00000204983 protease, serine 1 5644 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7.
ENSG00000169347 glycoprotein 2 2813 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants.
ENSG00000240342 ribosomal protein S2 pseudogene 5 ENSG00000240342 RPS2P5 NA
ENSG00000172809 ribosomal protein L38 6169 RPL38 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L38E family of ribosomal proteins. It is located in the cytoplasm. Alternative splice variants have been identified, both encoding the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome, including one located in the promoter region of the type 1 angiotensin II receptor gene.
ENSG00000136942 ribosomal protein L35 11224 RPL35 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L29P family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000182718 annexin A2 302 ANXA2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene.
ENSG00000171476 HOP homeobox 84525 HOPX The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene.
ENSG00000175206 natriuretic peptide A 4878 NPPA The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1.
ENSG00000110492 midkine (neurite growth-promoting factor 2) 4192 MDK This gene encodes a member of a small family of secreted growth factors that binds heparin and responds to retinoic acid. The encoded protein promotes cell growth, migration, and angiogenesis, in particular during tumorigenesis. This gene has been targeted as a therapeutic for a variety of different disorders. Alternatively spliced transcript variants encoding multiple isoforms have been observed.
ENSG00000229117 ribosomal protein L41 6171 RPL41 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which shares sequence similarity with the yeast ribosomal protein YL41, belongs to the L41E family of ribosomal proteins. It is located in the cytoplasm. The protein can interact with the beta subunit of protein kinase CKII and can stimulate the phosphorylation of DNA topoisomerase II-alpha by CKII. Two alternative splice variants have been identified, both encoding the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000137970 ribosomal protein L7 pseudogene 9 ENSG00000137970 RPL7P9 NA
ENSG00000131469 ribosomal protein L27 6155 RPL27 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L27E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000108821 collagen type I alpha 1 1277 COL1A1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene.
ENSG00000151729 solute carrier family 25 member 4 291 SLC25A4 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy.
ENSG00000167658 eukaryotic translation elongation factor 2 1938 EEF2 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation.
ENSG00000142789 chymotrypsin like elastase family member 3A 10136 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1.
ENSG00000213553 ribosomal protein, large, P0 pseudogene 6 ENSG00000213553 RPLP0P6 NA
ENSG00000171863 ribosomal protein S7 6201 RPS7 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S7E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000266844 NA ENSG00000266844 RP11-862L9.3 NA
ENSG00000227081 NA ENSG00000227081 RP11-543P15.1 NA
ENSG00000230202 NA ENSG00000230202 RP11-632C17__A.1 NA
ENSG00000161970 ribosomal protein L26 6154 RPL26 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L24P family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Mutations in this gene result in Diamond-Blackfan anemia. Alternative splicing results in multiple transcript variants.
ENSG00000105640 ribosomal protein L18a 6142 RPL18A Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L18AE family of ribosomal proteins that is a component of the 60S subunit. The encoded protein may play a role in viral replication by interacting with the hepatitis C virus internal ribosome entry site (IRES). This gene is co-transcribed with the U68 snoRNA, located within the third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome.
ENSG00000118181 ribosomal protein S25 6230 RPS25 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S25E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000234851 ribosomal protein L23a pseudogene 42 ENSG00000234851 RPL23AP42 NA
ENSG00000234797 ribosomal protein S3A pseudogene 6 ENSG00000234797 RPS3AP6 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query notfound
72 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 ENSG00000163017 NA
165 AE binding protein 1 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AEBP1 ENSG00000106624 NA
7431 vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM ENSG00000026025 NA
3849 keratin 2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT2 ENSG00000172867 NA
4629 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 ENSG00000133392 NA
60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB ENSG00000075624 NA
7038 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. TG ENSG00000042832 NA
NA NA NA NA ENSG00000117289 TRUE
1832 desmoplakin This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. DSP ENSG00000096696 NA
ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA MTND2P28 ENSG00000225630 NA
2194 fatty acid synthase The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. FASN ENSG00000169710 NA
100129518 uncharacterized LOC100129518 NA LOC100129518 ENSG00000112096 NA
6648 superoxide dismutase 2, mitochondrial This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. SOD2 ENSG00000112096 NA
1264 calponin 1 NA CNN1 ENSG00000130176 NA
3858 keratin 10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. KRT10 ENSG00000186395 NA
7273 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN ENSG00000155657 NA
8497 PTPRF interacting protein alpha 4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PPFIA4 ENSG00000143847 NA
ENSG00000269936 NA NA RP11-394O4.5 ENSG00000269936 NA
5284 polymeric immunoglobulin receptor This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. PIGR ENSG00000162896 NA
6319 stearoyl-CoA desaturase This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. SCD ENSG00000099194 NA
4627 myosin, heavy chain 9, non-muscle This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. MYH9 ENSG00000100345 NA
70 actin, alpha, cardiac muscle 1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ACTC1 ENSG00000159251 NA
4604 myosin binding protein C, slow type This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. MYBPC1 ENSG00000196091 NA
4052 latent transforming growth factor beta binding protein 1 The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. LTBP1 ENSG00000049323 NA
729238 surfactant protein A2 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. SFTPA2 ENSG00000185303 NA
1363 carboxypeptidase E This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. CPE ENSG00000109472 NA
3040 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA2 ENSG00000188536 NA
2027 enolase 3 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. ENO3 ENSG00000108515 NA
6440 surfactant protein C This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. SFTPC ENSG00000168484 NA
347 apolipoprotein D This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. APOD ENSG00000189058 NA
27063 ankyrin repeat domain 1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ANKRD1 ENSG00000148677 NA
4624 myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6 ENSG00000197616 NA
653509 surfactant protein A1 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. SFTPA1 ENSG00000122852 NA
1465 cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 ENSG00000159176 NA
1056 carboxyl ester lipase The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. CEL ENSG00000170835 NA
3490 insulin like growth factor binding protein 7 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). IGFBP7 ENSG00000163453 NA
125 alcohol dehydrogenase 1B (class I), beta polypeptide The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ADH1B ENSG00000196616 NA
80781 collagen type XVIII alpha 1 chain This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. COL18A1 ENSG00000182871 NA
3304 heat shock protein family A (Hsp70) member 1B This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. HSPA1B ENSG00000204388 NA
4619 myosin, heavy chain 1, skeletal muscle, adult Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. MYH1 ENSG00000109061 NA
1938 eukaryotic translation elongation factor 2 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation. EEF2 ENSG00000167658 NA
6876 transgelin The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. TAGLN ENSG00000149591 NA
677 ZFP36 ring finger protein-like 1 This gene is a member of the TIS11 family of early response genes, which are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. This gene is well conserved across species and has a promoter that contains motifs seen in other early-response genes. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ZFP36L1 ENSG00000185650 NA
3860 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13 ENSG00000171401 NA
ENSG00000237973 MT-CO1 pseudogene 12 NA MTCO1P12 ENSG00000237973 NA
4638 myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. MYLK ENSG00000065534 NA
23336 synemin The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. SYNM ENSG00000182253 NA
5346 perilipin 1 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. PLIN1 ENSG00000166819 NA
4014 loricrin This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. LOR ENSG00000203782 NA
2670 glial fibrillary acidic protein This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. GFAP ENSG00000131095 NA
2878 glutathione peroxidase 3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. GPX3 ENSG00000211445 NA
ENSG00000211896 immunoglobulin heavy constant gamma 1 (G1m marker) NA IGHG1 ENSG00000211896 NA
3983 actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. ABLIM1 ENSG00000099204 NA
348 apolipoprotein E The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. APOE ENSG00000130203 NA
3039 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA1 ENSG00000206172 NA
4359 myelin protein zero This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. MPZ ENSG00000158887 NA
1291 collagen type VI alpha 1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. COL6A1 ENSG00000142156 NA
2819 glycerol-3-phosphate dehydrogenase 1 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. GPD1 ENSG00000167588 NA
5265 serpin family A member 1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. SERPINA1 ENSG00000197249 NA
266727 MAM domain containing glycosylphosphatidylinositol anchor 1 NA MDGA1 ENSG00000112139 NA
22936 elongation factor for RNA polymerase II 2 NA ELL2 ENSG00000118985 NA
7139 troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. TNNT2 ENSG00000118194 NA
112399 egl-9 family hypoxia inducible factor 3 NA EGLN3 ENSG00000129521 NA
10867 tetraspanin 9 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. Alternatively spliced transcripts encoding the same protein have been identified. TSPAN9 ENSG00000011105 NA
1358 carboxypeptidase A2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. CPA2 ENSG00000158516 NA
84168 anthrax toxin receptor 1 This gene encodes a type I transmembrane protein and is a tumor-specific endothelial marker that has been implicated in colorectal cancer. The encoded protein has been shown to also be a docking protein or receptor for Bacillus anthracis toxin, the causative agent of the disease, anthrax. The binding of the protective antigen (PA) component, of the tripartite anthrax toxin, to this receptor protein mediates delivery of toxin components to the cytosol of cells. Once inside the cell, the other two components of anthrax toxin, edema factor (EF) and lethal factor (LF) disrupt normal cellular processes. Three alternatively spliced variants that encode different protein isoforms have been described. ANTXR1 ENSG00000169604 NA
7169 tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2 ENSG00000198467 NA
57326 PBX homeobox interacting protein 1 The protein encoded by this gene interacts with the PBX1 homeodomain protein, inhibiting its transcriptional activation potential by preventing its binding to DNA. The encoded protein, which is primarily cytosolic but can shuttle to the nucleus, also can interact with estrogen receptors alpha and beta and promote the proliferation of breast cancer, brain tumors, and lung cancer. Several transcript variants encoding different isoforms have been found for this gene. More variants exist, but their full-length natures have yet to be determined. PBXIP1 ENSG00000163346 NA
ENSG00000211890 immunoglobulin heavy constant alpha 2 (A2m marker) NA IGHA2 ENSG00000211890 NA
8828 neuropilin 2 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. NRP2 ENSG00000118257 NA
7849 paired box 8 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. PAX8 ENSG00000125618 NA
ENSG00000225972 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 NA MTND1P23 ENSG00000225972 NA
1410 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB ENSG00000109846 NA
4878 natriuretic peptide A The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NPPA ENSG00000175206 NA
2938 glutathione S-transferase alpha 1 This gene encodes a member of a family of enzymes that function to add glutathione to target electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins, and products of oxidative stress. This action is an important step in detoxification of these compounds. This subfamily of enzymes has a particular role in protecting cells from reactive oxygen species and the products of peroxidation. Polymorphisms in this gene influence the ability of individuals to metabolize different drugs. This gene is located in a cluster of similar genes and pseudogenes on chromosome 6. Alternative splicing results in multiple transcript variants. GSTA1 ENSG00000243955 NA
3778 potassium calcium-activated channel subfamily M alpha 1 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. KCNMA1 ENSG00000156113 NA
7077 TIMP metallopeptidase inhibitor 2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP2 ENSG00000035862 NA
4703 nebulin This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. NEB ENSG00000183091 NA
64065 PERP, TP53 apoptosis effector NA PERP ENSG00000112378 NA
26112 coiled-coil domain containing 69 NA CCDC69 ENSG00000198624 NA
ENSG00000234961 NA NA RP11-124N14.3 ENSG00000234961 NA
1292 collagen type VI alpha 2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. COL6A2 ENSG00000142173 NA
23231 SEL1L family member 3 NA SEL1L3 ENSG00000091490 NA
6280 S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100A9 ENSG00000163220 NA
ENSG00000263335 NA NA AF001548.5 ENSG00000263335 NA
63924 cell death inducing DFFA like effector c This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. CIDEC ENSG00000187288 NA
254428 solute carrier family 41 member 1 NA SLC41A1 ENSG00000133065 NA
6770 steroidogenic acute regulatory protein The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. STAR ENSG00000147465 NA
4257 microsomal glutathione S-transferase 1 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. MGST1 ENSG00000008394 NA
5376 peripheral myelin protein 22 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. PMP22 ENSG00000109099 NA
10529 nebulette This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. NEBL ENSG00000078114 NA
6867 transforming acidic coiled-coil containing protein 1 This locus may represent a breast cancer candidate gene. It is located close to FGFR1 on a region of chromosome 8 that is amplified in some breast cancers. Three transcript variants encoding different isoforms have been found for this gene. TACC1 ENSG00000147526 NA
4856 nephroblastoma overexpressed The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. NOV ENSG00000136999 NA
6281 S100 calcium binding protein A10 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. S100A10 ENSG00000197747 NA
811 calreticulin Calreticulin is a multifunctional protein that acts as a major Ca(2+)-binding (storage) protein in the lumen of the endoplasmic reticulum. It is also found in the nucleus, suggesting that it may have a role in transcription regulation. Calreticulin binds to the synthetic peptide KLGFFKR, which is almost identical to an amino acid sequence in the DNA-binding domain of the superfamily of nuclear receptors. Calreticulin binds to antibodies in certain sera of systemic lupus and Sjogren patients which contain anti-Ro/SSA antibodies, it is highly conserved among species, and it is located in the endoplasmic and sarcoplasmic reticulum where it may bind calcium. The amino terminus of calreticulin interacts with the DNA-binding domain of the glucocorticoid receptor and prevents the receptor from binding to its specific glucocorticoid response element. Calreticulin can inhibit the binding of androgen receptor to its hormone-responsive DNA element and can inhibit androgen receptor and retinoic acid receptor transcriptional activities in vivo, as well as retinoic acid-induced neuronal differentiation. Thus, calreticulin can act as an important modulator of the regulation of gene transcription by nuclear hormone receptors. Systemic lupus erythematosus is associated with increased autoantibody titers against calreticulin but calreticulin is not a Ro/SS-A antigen. Earlier papers referred to calreticulin as an Ro/SS-A antigen but this was later disproven. Increased autoantibody titer against human calreticulin is found in infants with complete congenital heart block of both the IgG and IgM classes. CALR ENSG00000179218 NA
4633 myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL2 ENSG00000111245 NA
NA NA NA NA ENSG00000256545 TRUE
ENSG00000229344 MT-CO2 pseudogene 12 NA MTCO2P12 ENSG00000229344 NA
493 ATPase plasma membrane Ca2+ transporting 4 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. ATP2B4 ENSG00000058668 NA
ENSG00000180139 ACTA2 antisense RNA 1 NA ACTA2-AS1 ENSG00000180139 NA
2752 glutamate-ammonia ligase The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. GLUL ENSG00000135821 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
FN1 2335 ENSG00000115414 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. NA
VIM 7431 ENSG00000026025 vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. NA
SPARC 6678 ENSG00000113140 secreted protein acidic and cysteine rich This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. NA
SYNPO 11346 ENSG00000171992 synaptopodin Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). NA
VIM-AS1 100507347 ENSG00000229124 VIM antisense RNA 1 NA NA
MYH9 4627 ENSG00000100345 myosin, heavy chain 9, non-muscle This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. NA
RP11-124N14.3 ENSG00000234961 ENSG00000234961 NA NA NA
MYH11 4629 ENSG00000133392 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
LTBP2 4053 ENSG00000119681 latent transforming growth factor beta binding protein 2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. NA
MFGE8 4240 ENSG00000140545 milk fat globule-EGF factor 8 protein This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. NA
CTGF 1490 ENSG00000118523 connective tissue growth factor The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. NA
NOTCH3 4854 ENSG00000074181 notch 3 This gene encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remains to be determined. Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). NA
LOC100129518 100129518 ENSG00000112096 uncharacterized LOC100129518 NA NA
SOD2 6648 ENSG00000112096 superoxide dismutase 2, mitochondrial This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. NA
S100A9 6280 ENSG00000163220 S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. NA
PKM 5315 ENSG00000067225 pyruvate kinase, muscle This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. NA
TIMP2 7077 ENSG00000035862 TIMP metallopeptidase inhibitor 2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. NA
PLEC 5339 ENSG00000178209 plectin Plectin is a prominent member of an important family of structurally and in part functionally related proteins, termed plakins or cytolinkers, that are capable of interlinking different elements of the cytoskeleton. Plakins, with their multi-domain structure and enormous size, not only play crucial roles in maintaining cell and tissue integrity and orchestrating dynamic changes in cytoarchitecture and cell shape, but also serve as scaffolding platforms for the assembly, positioning, and regulation of signaling complexes (reviewed in PMID: 9701547, 11854008, and 17499243). Plectin is expressed as several protein isoforms in a wide range of cell types and tissues from a single gene located on chromosome 8 in humans (PMID: 8633055, 8698233). Until 2010, this locus was named plectin 1 (symbol PLEC1 in human; Plec1 in mouse and rat) and the gene product had been referred to as ‘hemidesmosomal protein 1’ or ‘plectin 1, intermediate filament binding 500kDa’. These names were superseded by plectin. The plectin gene locus in mouse on chromosome 15 has been analyzed in detail (PMID: 10556294, 14559777), revealing a genomic exon-intron organization with well over 40 exons spanning over 62 kb and an unusual 5’ transcript complexity of plectin isoforms. Eleven exons (1-1j) have been identified that alternatively splice directly into a common exon 2 which is the first exon to encode plectin’s highly conserved actin binding domain (ABD). Three additional exons (-1, 0a, and 0) splice into an alternative first coding exon (1c), and two additional exons (2alpha and 3alpha) are optionally spliced within the exons encoding the acting binding domain (exons 2-8). Analysis of the human locus has identified eight of the eleven alternative 5’ exons found in mouse and rat (PMID: 14672974); exons 1i, 1j and 1h have not been confirmed in human. Furthermore, isoforms lacking the central rod domain encoded by exon 31 have been detected in mouse (PMID:10556294), rat (PMID: 9177781), and human (PMID: 11441066, 10780662, 20052759). The short alternative amino-terminal sequences encoded by the different first exons direct the targeting of the various isoforms to distinct subcellular locations (PMID: 14559777). As the expression of specific plectin isoforms was found to be dependent on cell type (tissue) and stage of development (PMID: 10556294, 12542521, 17389230) it appears that each cell type (tissue) contains a unique set (proportion and composition) of plectin isoforms, as if custom-made for specific requirements of the particular cells. Concordantly, individual isoforms were found to carry out distinct and specific functions (PMID: 14559777, 12542521, 18541706). In 1996, a number of groups reported that patients suffering from epidermolysis bullosa simplex with muscular dystrophy (EBS-MD) lacked plectin expression in skin and muscle tissues due to defects in the plectin gene (PMID: 8698233, 8941634, 8636409, 8894687, 8696340). Two other subtypes of plectin-related EBS have been described: EBS-pyloric atresia (PA) and EBS-Ogna. For reviews of plectin-related diseases see PMID: 15810881, 19945614. Mutations in the plectin gene related to human diseases should be named based on the position in NM_000445 (variant 1, isoform 1c), unless the mutation is located within one of the other alternative first exons, in which case the position in the respective Reference Sequence should be used. NA
LTBP4 8425 ENSG00000090006 latent transforming growth factor beta binding protein 4 The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. NA
DKK3 27122 ENSG00000050165 dickkopf WNT signaling pathway inhibitor 3 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. NA
IGFBP7 3490 ENSG00000163453 insulin like growth factor binding protein 7 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). NA
GAS6 2621 ENSG00000183087 growth arrest specific 6 This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. NA
ELN 2006 ENSG00000049540 elastin This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. NA
S100A8 6279 ENSG00000143546 S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
AEBP1 165 ENSG00000106624 AE binding protein 1 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. NA
FBLN5 10516 ENSG00000140092 fibulin 5 The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). NA
KRT13 3860 ENSG00000171401 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. NA
ACTA1 58 ENSG00000143632 actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. NA
CRIP2 1397 ENSG00000182809 cysteine rich protein 2 This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. NA
GAPDH 2597 ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. NA
APOD 347 ENSG00000189058 apolipoprotein D This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. NA
PRELP 5549 ENSG00000188783 proline and arginine rich end leucine rich repeat protein The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. NA
S100A4 6275 ENSG00000196154 S100 calcium binding protein A4 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
S100A6 6277 ENSG00000197956 S100 calcium binding protein A6 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in stimulation of Ca2+-dependent insulin release, stimulation of prolactin secretion, and exocytosis. Chromosomal rearrangements and altered expression of this gene have been implicated in melanoma. NA
SLC44A2 57153 ENSG00000129353 solute carrier family 44 member 2 NA NA
PSAP 5660 ENSG00000197746 prosaposin This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. NA
MPZ 4359 ENSG00000158887 myelin protein zero This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. NA
HSPG2 3339 ENSG00000142798 heparan sulfate proteoglycan 2 This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. NA
NA NA ENSG00000117289 NA NA TRUE
WISP2 8839 ENSG00000064205 WNT1 inducible signaling pathway protein 2 This gene encodes a member of the WNT1 inducible signaling pathway (WISP) protein subfamily, which belongs to the connective tissue growth factor (CTGF) family. WNT1 is a member of a family of cysteine-rich, glycosylated signaling proteins that mediate diverse developmental processes. The CTGF family members are characterized by four conserved cysteine-rich domains: insulin-like growth factor-binding domain, von Willebrand factor type C module, thrombospondin domain and C-terminal cystine knot-like (CT) domain. The encoded protein lacks the CT domain which is implicated in dimerization and heparin binding. It is 72% identical to the mouse protein at the amino acid level. This gene may be downstream in the WNT1 signaling pathway that is relevant to malignant transformation. Its expression in colon tumors is reduced while the other two WISP members are overexpressed in colon tumors. It is expressed at high levels in bone tissue, and may play an important role in modulating bone turnover. NA
AHNAK 79026 ENSG00000124942 AHNAK nucleoprotein NA NA
ACTG2 72 ENSG00000163017 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. NA
COL1A2 1278 ENSG00000164692 collagen type I alpha 2 chain This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. NA
VCAN 1462 ENSG00000038427 versican This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. NA
MAP4 4134 ENSG00000047849 microtubule associated protein 4 The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. NA
EFEMP1 2202 ENSG00000115380 EGF containing fibulin like extracellular matrix protein 1 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. NA
ADIRF 10974 ENSG00000148671 adipogenesis regulatory factor APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. NA
FMOD 2331 ENSG00000122176 fibromodulin Fibromodulin belongs to the family of small interstitial proteoglycans. The encoded protein possesses a central region containing leucine-rich repeats with 4 keratan sulfate chains, flanked by terminal domains containing disulphide bonds. Owing to the interaction with type I and type II collagen fibrils and in vitro inhibition of fibrillogenesis, the encoded protein may play a role in the assembly of extracellular matrix. It may also regulate TGF-beta activities by sequestering TGF-beta into the extracellular matrix. Sequence variations in this gene may be associated with the pathogenesis of high myopia. Alternative splicing results in multiple transcript variants. NA
SLC2A1 6513 ENSG00000117394 solute carrier family 2 member 1 This gene encodes a major glucose transporter in the mammalian blood-brain barrier. The encoded protein is found primarily in the cell membrane and on the cell surface, where it can also function as a receptor for human T-cell leukemia virus (HTLV) I and II. Mutations in this gene have been found in a family with paroxysmal exertion-induced dyskinesia. NA
CYB5R3 1727 ENSG00000100243 cytochrome b5 reductase 3 This gene encodes cytochrome b5 reductase, which includes a membrane-bound form in somatic cells (anchored in the endoplasmic reticulum, mitochondrial and other membranes) and a soluble form in erythrocytes. The membrane-bound form exists mainly on the cytoplasmic side of the endoplasmic reticulum and functions in desaturation and elongation of fatty acids, in cholesterol biosynthesis, and in drug metabolism. The erythrocyte form is located in a soluble fraction of circulating erythrocytes and is involved in methemoglobin reduction. The membrane-bound form has both membrane-binding and catalytic domains, while the soluble form has only the catalytic domain. Alternate splicing results in multiple transcript variants. Mutations in this gene cause methemoglobinemias. NA
THBS2 7058 ENSG00000186340 thrombospondin 2 The protein encoded by this gene belongs to the thrombospondin family. It is a disulfide-linked homotrimeric glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein has been shown to function as a potent inhibitor of tumor growth and angiogenesis. Studies of the mouse counterpart suggest that this protein may modulate the cell surface properties of mesenchymal cells and be involved in cell adhesion and migration. NA
CRYAB 1410 ENSG00000109846 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. NA
SERPINE1 5054 ENSG00000106366 serpin family E member 1 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
PRX 57716 ENSG00000105227 periaxin This gene encodes a protein involved in peripheral nerve myelin upkeep. The encoded protein contains 2 PDZ domains which were named after PSD95 (post synaptic density protein), DlgA (Drosophila disc large tumor suppressor), and ZO1 (a mammalian tight junction protein). Two alternatively spliced transcript variants have been described for this gene which encode different protein isoforms and which are targeted differently in the Schwann cell. Mutations in this gene cause Charcot-Marie-Tooth neuoropathy, type 4F and Dejerine-Sottas neuropathy. NA
MYO1D 4642 ENSG00000176658 myosin ID NA NA
ACTN4 81 ENSG00000130402 actinin alpha 4 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. NA
C1orf198 84886 ENSG00000119280 chromosome 1 open reading frame 198 NA NA
KRT4 3851 ENSG00000170477 keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
ITGB5 3693 ENSG00000082781 integrin subunit beta 5 NA NA
IGFBP6 3489 ENSG00000167779 insulin like growth factor binding protein 6 NA NA
SUSD2 56241 ENSG00000099994 sushi domain containing 2 NA NA
ITPR3 3710 ENSG00000096433 inositol 1,4,5-trisphosphate receptor type 3 This gene encodes a receptor for inositol 1,4,5-trisphosphate, a second messenger that mediates the release of intracellular calcium. The receptor contains a calcium channel at the C-terminus and the ligand-binding site at the N-terminus. Knockout studies in mice suggest that type 2 and type 3 inositol 1,4,5-trisphosphate receptors play a key role in exocrine secretion underlying energy metabolism and growth. NA
CTSD 1509 ENSG00000117984 cathepsin D This gene encodes a member of the A1 family of peptidases. The encoded preproprotein is proteolytically processed to generate multiple protein products. These products include the cathepsin D light and heavy chains, which heterodimerize to form the mature enzyme. This enzyme exhibits pepsin-like activity and plays a role in protein turnover and in the proteolytic activation of hormones and growth factors. Mutations in this gene play a causal role in neuronal ceroid lipofuscinosis-10 and may be involved in the pathogenesis of several other diseases, including breast cancer and possibly Alzheimer’s disease. NA
PTGDS 5730 ENSG00000107317 prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. NA
MMP2 4313 ENSG00000087245 matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
LRP1 4035 ENSG00000123384 LDL receptor related protein 1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. NA
LMO7 4008 ENSG00000136153 LIM domain 7 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. NA
COL8A1 1295 ENSG00000144810 collagen type VIII alpha 1 This gene encodes one of the two alpha chains of type VIII collagen. The gene product is a short chain collagen and a major component of the basement membrane of the corneal endothelium. The type VIII collagen fibril can be either a homo- or a heterotrimer. Alternatively spliced transcript variants encoding the same protein have been observed. NA
CRIM1 51232 ENSG00000150938 cysteine rich transmembrane BMP regulator 1 (chordin-like) This gene encodes a transmembrane protein containing six cysteine-rich repeat domains and an insulin-like growth factor-binding domain. The encoded protein may play a role in tissue development though interactions with members of the transforming growth factor beta family, such as bone morphogenetic proteins. NA
SNTA1 6640 ENSG00000101400 syntrophin alpha 1 Syntrophins are cytoplasmic peripheral membrane scaffold proteins that are components of the dystrophin-associated protein complex. This gene is a member of the syntrophin gene family and encodes the most common syntrophin isoform found in cardiac tissues. The N-terminal PDZ domain of this syntrophin protein interacts with the C-terminus of the pore-forming alpha subunit (SCN5A) of the cardiac sodium channel Nav1.5. This protein also associates cardiac sodium channels with the nitric oxide synthase-PMCA4b (plasma membrane Ca-ATPase subtype 4b) complex in cardiomyocytes. This gene is a susceptibility locus for Long-QT syndrome (LQT) - an inherited disorder associated with sudden cardiac death from arrhythmia - and sudden infant death syndrome (SIDS). This protein also associates with dystrophin and dystrophin-related proteins at the neuromuscular junction and alters intracellular calcium ion levels in muscle tissue. NA
ACTN2 88 ENSG00000077522 actinin alpha 2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. NA
ERRFI1 54206 ENSG00000116285 ERBB receptor feedback inhibitor 1 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). NA
LAMP1 3916 ENSG00000185896 lysosomal associated membrane protein 1 The protein encoded by this gene is a member of a family of membrane glycoproteins. This glycoprotein provides selectins with carbohydrate ligands. It may also play a role in tumor cell metastasis. NA
ERBB2 2064 ENSG00000141736 erb-b2 receptor tyrosine kinase 2 This gene encodes a member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases. This protein has no ligand binding domain of its own and therefore cannot bind growth factors. However, it does bind tightly to other ligand-bound EGF receptor family members to form a heterodimer, stabilizing ligand binding and enhancing kinase-mediated activation of downstream signalling pathways, such as those involving mitogen-activated protein kinase and phosphatidylinositol-3 kinase. Allelic variations at amino acid positions 654 and 655 of isoform a (positions 624 and 625 of isoform b) have been reported, with the most common allele, Ile654/Ile655, shown here. Amplification and/or overexpression of this gene has been reported in numerous cancers, including breast and ovarian tumors. Alternative splicing results in several additional transcript variants, some encoding different isoforms and others that have not been fully characterized. NA
LDB3 11155 ENSG00000122367 LIM domain binding 3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. NA
NAMPT 10135 ENSG00000105835 nicotinamide phosphoribosyltransferase This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. NA
PRKCDBP 112464 ENSG00000170955 protein kinase C delta binding protein The protein encoded by this gene was identified as a binding protein of the protein kinase C, delta (PRKCD). The expression of this gene in cultured cell lines is strongly induced by serum starvation. The expression of this protein was found to be down-regulated in various cancer cell lines, suggesting the possible tumor suppressor function of this protein. NA
PMP22 5376 ENSG00000109099 peripheral myelin protein 22 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. NA
CCDC3 83643 ENSG00000151468 coiled-coil domain containing 3 NA NA
ABCA2 20 ENSG00000107331 ATP binding cassette subfamily A member 2 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. NA
SERPINB1 1992 ENSG00000021355 serpin family B member 1 The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. NA
SFRP4 6424 ENSG00000106483 secreted frizzled related protein 4 Secreted frizzled-related protein 4 (SFRP4) is a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. The expression of SFRP4 in ventricular myocardium correlates with apoptosis related gene expression. NA
CCDC80 151887 ENSG00000091986 coiled-coil domain containing 80 NA NA
CXCL14 9547 ENSG00000145824 C-X-C motif chemokine ligand 14 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. NA
MYL12A 10627 ENSG00000101608 myosin light chain 12A This gene encodes a nonsarcomeric myosin regulatory light chain. This protein is activated by phosphorylation and regulates smooth muscle and non-muscle cell contraction. This protein may also be involved in DNA damage repair by sequestering the transcriptional regulator apoptosis-antagonizing transcription factor (AATF)/Che-1 which functions as a repressor of p53-driven apoptosis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 8. NA
TNS3 64759 ENSG00000136205 tensin 3 NA NA
PTRF 284119 ENSG00000177469 polymerase I and transcript release factor This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. NA
ALB 213 ENSG00000163631 albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. NA
SPRR3 6707 ENSG00000163209 small proline rich protein 3 NA NA
KLF2 10365 ENSG00000127528 Kruppel like factor 2 Kruppel-like factors (KLFs) are a family of broadly expressed zinc finger transcription factors. KLF2 regulates T-cell trafficking by promoting expression of the lipid-binding receptor S1P1 (S1PR1; MIM 601974) and the selectin CD62L (SELL; MIM 153240) (summary by Weinreich et al., 2009 [PubMed 19592277]). NA
PSD 5662 ENSG00000059915 pleckstrin and Sec7 domain containing This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. NA
GSN 2934 ENSG00000148180 gelsolin The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. NA
HSPD1 3329 ENSG00000144381 heat shock protein family D (Hsp60) member 1 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. NA
SORBS3 10174 ENSG00000120896 sorbin and SH3 domain containing 3 This gene encodes an SH3 domain-containing adaptor protein. The presence of SH3 domains play a role in this protein’s ability to bind other cytoplasmic molecules and contribute to cystoskeletal organization, cell adhesion and migration, signaling, and gene expression. Multiple transcript variants encoding different isoforms have been found for this gene. NA
UTRN 7402 ENSG00000152818 utrophin This gene shares both structural and functional similarities with the dystrophin gene. It contains an actin-binding N-terminus, a triple coiled-coil repeat central region, and a C-terminus that consists of protein-protein interaction motifs which interact with dystroglycan protein components. The protein encoded by this gene is located at the neuromuscular synapse and myotendinous junctions, where it participates in post-synaptic membrane maintenance and acetylcholine receptor clustering. Mouse studies suggest that this gene may serve as a functional substitute for the dystrophin gene and therefore, may serve as a potential therapeutic alternative to muscular dystrophy which is caused by mutations in the dystrophin gene. Alternative splicing of the utrophin gene has been described; however, the full-length nature of these variants has not yet been determined. NA
AC019349.5 ENSG00000229732 ENSG00000229732 NA NA NA
ITGA10 8515 ENSG00000143127 integrin subunit alpha 10 Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. NA
ITGBL1 9358 ENSG00000198542 integrin subunit beta like 1 This gene encodes a beta integrin-related protein that is a member of the EGF-like protein family. The encoded protein contains integrin-like cysteine-rich repeats. Alternative splicing results in multiple transcript variants. NA
COL16A1 1307 ENSG00000084636 collagen type XVI alpha 1 chain This gene encodes the alpha chain of type XVI collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Members of this collagen family are found in association with fibril-forming collagens such as type I and II, and serve to maintain the integrity of the extracellular matrix. High levels of type XVI collagen have been found in fibroblasts and keratinocytes, and in smooth muscle and amnion. NA
DES 1674 ENSG00000175084 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. NA
MICAL1 64780 ENSG00000135596 microtubule associated monooxygenase, calponin and LIM domain containing 1 This gene encodes an enzyme that oxidizes methionine residues on actin, thereby promoting depolymerization of actin filaments. This protein interacts with and regulates signalling by NEDD9/CAS-L (neural precursor cell expressed, developmentally down-regulated 9). Alternative splicing results in multiple transcript variants. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000107796 ACTA2 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta NA
ENSG00000149591 TAGLN 6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. transgelin NA
ENSG00000180139 ACTA2-AS1 ENSG00000180139 NA ACTA2 antisense RNA 1 NA
ENSG00000075624 ACTB 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta NA
ENSG00000171401 KRT13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 NA
ENSG00000186395 KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 NA
ENSG00000172867 KRT2 3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 2 NA
ENSG00000125868 DSTN 11034 The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. destrin, actin depolymerizing factor NA
ENSG00000145012 LPP 4026 This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. LIM domain containing preferred translocation partner in lipoma NA
ENSG00000096696 DSP 1832 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. desmoplakin NA
ENSG00000163431 LMOD1 25802 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. leiomodin 1 NA
ENSG00000115386 REG1A 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 alpha NA
ENSG00000148600 CDHR1 92211 This gene belongs to the cadherin superfamily of calcium-dependent cell adhesion molecules. The encoded protein is a photoreceptor-specific cadherin that plays a role in outer segment disc morphogenesis. Mutations in this gene are associated with inherited retinal dystrophies. Alternatively spliced transcript variants encoding different isoforms have been identified. cadherin related family member 1 NA
ENSG00000143248 RGS5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. regulator of G-protein signaling 5 NA
ENSG00000167768 KRT1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 NA
ENSG00000225630 MTND2P28 ENSG00000225630 NA mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA
ENSG00000133392 MYH11 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle NA
ENSG00000175084 DES 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin NA
ENSG00000229732 AC019349.5 ENSG00000229732 NA NA NA
ENSG00000171476 HOPX 84525 The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOP homeobox NA
ENSG00000184009 ACTG1 71 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. actin gamma 1 NA
ENSG00000167641 PPP1R14A 94274 The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. protein phosphatase 1 regulatory inhibitor subunit 14A NA
ENSG00000072952 MRVI1 10335 This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. murine retrovirus integration site 1 homolog NA
ENSG00000079308 TNS1 7145 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. tensin 1 NA
ENSG00000145824 CXCL14 9547 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. C-X-C motif chemokine ligand 14 NA
ENSG00000170477 KRT4 3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 NA
ENSG00000116133 DHCR24 1718 This gene encodes a flavin adenine dinucleotide (FAD)-dependent oxidoreductase which catalyzes the reduction of the delta-24 double bond of sterol intermediates during cholesterol biosynthesis. The protein contains a leader sequence that directs it to the endoplasmic reticulum membrane. Missense mutations in this gene have been associated with desmosterolosis. Also, reduced expression of the gene occurs in the temporal cortex of Alzheimer disease patients and overexpression has been observed in adrenal gland cancer cells. 24-dehydrocholesterol reductase NA
ENSG00000198467 TPM2 7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) NA
ENSG00000115414 FN1 2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. fibronectin 1 NA
ENSG00000148795 CYP17A1 1586 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. cytochrome P450 family 17 subfamily A member 1 NA
ENSG00000128591 FLNC 2318 This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. filamin C NA
ENSG00000196616 ADH1B 125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide NA
ENSG00000157110 RBPMS 11030 This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. RNA binding protein with multiple splicing NA
ENSG00000143546 S100A8 6279 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. S100 calcium binding protein A8 NA
ENSG00000159176 CSRP1 1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. cysteine and glycine rich protein 1 NA
ENSG00000106809 OGN 4969 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. osteoglycin NA
ENSG00000163209 SPRR3 6707 NA small proline rich protein 3 NA
ENSG00000197616 MYH6 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha NA
ENSG00000269936 RP11-394O4.5 ENSG00000269936 NA NA NA
ENSG00000049540 ELN 2006 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. elastin NA
ENSG00000011465 DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin NA
ENSG00000203782 LOR 4014 This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. loricrin NA
ENSG00000257017 HP 3240 This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. haptoglobin NA
ENSG00000106123 EPHB6 2051 This gene encodes a member of a family of transmembrane proteins that function as receptors for ephrin-B family proteins. Unlike other members of this family, the encoded protein does not contain a functional kinase domain. Activity of this protein can influence cell adhesion and migration. Expression of this gene is downregulated during tumor progression, suggesting that the protein may suppress tumor invasion and metastasis. Alternative splicing results in multiple transcript variants. EPH receptor B6 NA
ENSG00000111341 MGP 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. matrix Gla protein NA
ENSG00000077943 ITGA8 8516 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. integrin subunit alpha 8 NA
ENSG00000204388 HSPA1B 3304 This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. heat shock protein family A (Hsp70) member 1B NA
ENSG00000080573 COL5A3 50509 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are thought to be responsible for the symptoms of a subset of patients with Ehlers-Danlos syndrome type III. Messages of several sizes can be detected in northern blots but sequence information cannot confirm the identity of the shorter messages. collagen type V alpha 3 NA
ENSG00000197746 PSAP 5660 This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. prosaposin NA
ENSG00000178585 CTNNBIP1 56998 The protein encoded by this gene binds CTNNB1 and prevents interaction between CTNNB1 and TCF family members. The encoded protein is a negative regulator of the Wnt signaling pathway. Two transcript variants encoding the same protein have been found for this gene. catenin beta interacting protein 1 NA
ENSG00000068078 FGFR3 2261 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. fibroblast growth factor receptor 3 NA
ENSG00000163631 ALB 213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. albumin NA
ENSG00000087266 SH3BP2 6452 The protein encoded by this gene has an N-terminal pleckstrin homology (PH) domain, an SH3-binding proline-rich region, and a C-terminal SH2 domain. The protein binds to the SH3 domains of several proteins including the ABL1 and SYK protein tyrosine kinases , and functions as a cytoplasmic adaptor protein to positively regulate transcriptional activity in T, natural killer (NK), and basophilic cells. Mutations in this gene result in cherubism. Multiple transcript variants encoding different isoforms have been found for this gene. SH3 domain binding protein 2 NA
ENSG00000259627 RP11-244F12.2 ENSG00000259627 NA NA NA
ENSG00000137857 DUOX1 53905 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. dual oxidase 1 NA
ENSG00000140416 TPM1 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) NA
ENSG00000237973 MTCO1P12 ENSG00000237973 NA MT-CO1 pseudogene 12 NA
ENSG00000178372 CALML5 51806 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. calmodulin like 5 NA
ENSG00000172023 REG1B 5968 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 beta NA
ENSG00000159251 ACTC1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). actin, alpha, cardiac muscle 1 NA
ENSG00000065534 MYLK 4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. myosin light chain kinase NA
ENSG00000072110 ACTN1 87 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. actinin alpha 1 NA
ENSG00000077782 FGFR1 2260 The protein encoded by this gene is a member of the fibroblast growth factor receptor (FGFR) family, where amino acid sequence is highly conserved between members and throughout evolution. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein consists of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds both acidic and basic fibroblast growth factors and is involved in limb induction. Mutations in this gene have been associated with Pfeiffer syndrome, Jackson-Weiss syndrome, Antley-Bixler syndrome, osteoglophonic dysplasia, and autosomal dominant Kallmann syndrome 2. Chromosomal aberrations involving this gene are associated with stem cell myeloproliferative disorder and stem cell leukemia lymphoma syndrome. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized. fibroblast growth factor receptor 1 NA
ENSG00000163017 ACTG2 72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric NA
ENSG00000161634 DCD 117159 This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. dermcidin NA
ENSG00000143126 CELSR2 1952 The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. The specific function of this particular member has not been determined. cadherin EGF LAG seven-pass G-type receptor 2 NA
ENSG00000156113 KCNMA1 3778 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. potassium calcium-activated channel subfamily M alpha 1 NA
ENSG00000081277 PKP1 5317 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. plakophilin 1 NA
ENSG00000135046 ANXA1 301 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. annexin A1 NA
ENSG00000132470 ITGB4 3691 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. integrin subunit beta 4 NA
ENSG00000106772 PRUNE2 158471 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. prune homolog 2 NA
ENSG00000113140 SPARC 6678 This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. secreted protein acidic and cysteine rich NA
ENSG00000138735 PDE5A 8654 This gene encodes a cGMP-binding, cGMP-specific phosphodiesterase, a member of the cyclic nucleotide phosphodiesterase family. This phosphodiesterase specifically hydrolyzes cGMP to 5’-GMP. It is involved in the regulation of intracellular concentrations of cyclic nucleotides and is important for smooth muscle relaxation in the cardiovascular system. Alternative splicing of this gene results in three transcript variants encoding distinct isoforms. phosphodiesterase 5A NA
ENSG00000122786 CALD1 800 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. caldesmon 1 NA
ENSG00000256309 NA NA NA NA TRUE
ENSG00000103034 NDRG4 65009 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. NDRG family member 4 NA
ENSG00000125730 C3 718 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. complement component 3 NA
ENSG00000187605 TET3 200424 Members of the ten-eleven translocation (TET) gene family, including TET3, play a role in the DNA methylation process (Langemeijer et al., 2009 [PubMed 19923888]). tet methylcytosine dioxygenase 3 NA
ENSG00000122304 PRM2 5620 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. protamine 2 NA
ENSG00000108828 VAT1 10493 Synaptic vesicles are responsible for regulating the storage and release of neurotransmitters in the nerve terminal. The protein encoded by this gene is an abundant integral membrane protein of cholinergic synaptic vesicles and is thought to be involved in vesicular transport. It belongs to the quinone oxidoreductase subfamily of zinc-containing alcohol dehydrogenase proteins. vesicle amine transport 1 NA
ENSG00000136153 LMO7 4008 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. LIM domain 7 NA
ENSG00000186081 KRT5 3852 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 5 NA
ENSG00000173801 JUP 3728 This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. junction plakoglobin NA
ENSG00000175183 CSRP2 1466 CSRP2 is a member of the CSRP family of genes, encoding a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. CRP2 contains two copies of the cysteine-rich amino acid sequence motif (LIM) with putative zinc-binding activity, and may be involved in regulating ordered cell growth. Other genes in the family include CSRP1 and CSRP3. Alternative splicing results in multiple transcript variants. cysteine and glycine rich protein 2 NA
ENSG00000185201 IFITM2 10581 NA interferon induced transmembrane protein 2 NA
ENSG00000164266 SPINK1 6690 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. serine peptidase inhibitor, Kazal type 1 NA
ENSG00000118194 TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. troponin T2, cardiac type NA
ENSG00000009307 CSDE1 7812 NA cold shock domain containing E1 NA
ENSG00000182871 COL18A1 80781 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. collagen type XVIII alpha 1 chain NA
ENSG00000155657 TTN 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin NA
ENSG00000163754 GYG1 2992 This gene encodes a member of the glycogenin family. Glycogenin is a glycosyltransferase that catalyzes the formation of a short glucose polymer from uridine diphosphate glucose in an autoglucosylation reaction. This reaction is followed by elongation and branching of the polymer, catalyzed by glycogen synthase and branching enzyme, to form glycogen. This gene is expressed in muscle and other tissues. Mutations in this gene result in glycogen storage disease XV. This gene has pseudogenes on chromosomes 1, 8 and 13 respectively. Alternatively spliced transcript variants encoding different isoforms have been identified. glycogenin 1 NA
ENSG00000120885 CLU 1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. clusterin NA
ENSG00000101335 MYL9 10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. myosin light chain 9 NA
ENSG00000109610 SOD3 6649 This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. superoxide dismutase 3, extracellular NA
ENSG00000143536 CRNN 49860 This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. cornulin NA
ENSG00000185532 PRKG1 5592 Mammals have three different isoforms of cyclic GMP-dependent protein kinase (Ialpha, Ibeta, and II). These PRKG isoforms act as key mediators of the nitric oxide/cGMP signaling pathway and are important components of many signal transduction processes in diverse cell types. This PRKG1 gene on human chromosome 10 encodes the soluble Ialpha and Ibeta isoforms of PRKG by alternative transcript splicing. A separate gene on human chromosome 4, PRKG2, encodes the membrane-bound PRKG isoform II. The PRKG1 proteins play a central role in regulating cardiovascular and neuronal functions in addition to relaxing smooth muscle tone, preventing platelet aggregation, and modulating cell growth. This gene is most strongly expressed in all types of smooth muscle, platelets, cerebellar Purkinje cells, hippocampal neurons, and the lateral amygdala. Isoforms Ialpha and Ibeta have identical cGMP-binding and catalytic domains but differ in their leucine/isoleucine zipper and autoinhibitory sequences and therefore differ in their dimerization substrates and kinase enzyme activity. protein kinase, cGMP-dependent, type I NA
ENSG00000132329 RAMP1 10267 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. receptor activity modifying protein 1 NA
ENSG00000244734 HBB 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta NA
ENSG00000147872 PLIN2 123 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. perilipin 2 NA
ENSG00000159069 FBXW5 54461 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene contains WD-40 domains, in addition to an F-box motif, so it belongs to the Fbw class. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene, however, they were found to be nonsense-mediated mRNA decay (NMD) candidates, hence not represented. F-box and WD repeat domain containing 5 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
symbol query summary name X_id
S100A9 ENSG00000163220 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 6280
HBB ENSG00000244734 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta 3043
S100A8 ENSG00000143546 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. S100 calcium binding protein A8 6279
HBA2 ENSG00000188536 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 2 3040
KRT13 ENSG00000171401 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 3860
REG1A ENSG00000115386 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 alpha 5967
KRT4 ENSG00000170477 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 3851
SVIL ENSG00000197321 This gene encodes a bipartite protein with distinct amino- and carboxy-terminal domains. The amino-terminus contains nuclear localization signals and the carboxy-terminus contains numerous consecutive sequences with extensive similarity to proteins in the gelsolin family of actin-binding proteins, which cap, nucleate, and/or sever actin filaments. The gene product is tightly associated with both actin filaments and plasma membranes, suggesting a role as a high-affinity link between the actin cytoskeleton and the membrane. The encoded protein appears to aid in both myosin II assembly during cell spreading and disassembly of focal adhesions. Several transcript variants encoding different isoforms of supervillin have been described. supervillin 6840
DES ENSG00000175084 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin 1674
MYO1F ENSG00000142347 NA myosin IF 4542
C10orf54 ENSG00000107738 NA chromosome 10 open reading frame 54 64115
CSF3R ENSG00000119535 The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. colony stimulating factor 3 receptor 1441
CKM ENSG00000104879 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type 1158
MKNK2 ENSG00000099875 This gene encodes a member of the calcium/calmodulin-dependent protein kinases (CAMK) Ser/Thr protein kinase family, which belongs to the protein kinase superfamily. This protein contains conserved DLG (asp-leu-gly) and ENIL (glu-asn-ile-leu) motifs, and an N-terminal polybasic region which binds importin A and the translation factor scaffold protein eukaryotic initiation factor 4G (eIF4G). This protein is one of the downstream kinases activated by mitogen-activated protein (MAP) kinases. It phosphorylates the eukaryotic initiation factor 4E (eIF4E), thus playing important roles in the initiation of mRNA translation, oncogenic transformation and malignant cell proliferation. In addition to eIF4E, this protein also interacts with von Hippel-Lindau tumor suppressor (VHL), ring-box 1 (Rbx1) and Cullin2 (Cul2), which are all components of the CBC(VHL) ubiquitin ligase E3 complex. Multiple alternatively spliced transcript variants have been found, but the full-length nature and biological activity of only two variants are determined. These two variants encode distinct isoforms which differ in activity and regulation, and in subcellular localization. MAP kinase interacting serine/threonine kinase 2 2872
COL4A2 ENSG00000134871 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. collagen type IV alpha 2 1284
GAPDH ENSG00000111640 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase 2597
TPM3 ENSG00000143549 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. tropomyosin 3 7170
MEDAG ENSG00000102802 NA mesenteric estrogen dependent adipogenesis 84935
FLOT2 ENSG00000132589 Caveolae are small domains on the inner cell membrane involved in vesicular trafficking and signal transduction. This gene encodes a caveolae-associated, integral membrane protein, which is thought to function in neuronal signaling. flotillin 2 2319
FLNC ENSG00000128591 This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. filamin C 2318
SERPINE1 ENSG00000106366 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. serpin family E member 1 5054
MYH7 ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta 4625
COL1A1 ENSG00000108821 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 1 1277
EHBP1L1 ENSG00000173442 NA EH domain binding protein 1 like 1 254102
AC019349.5 ENSG00000229732 NA NA ENSG00000229732
HBA1 ENSG00000206172 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 1 3039
CSTA ENSG00000121552 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. cystatin A 1475
IL1RN ENSG00000136689 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. interleukin 1 receptor antagonist 3557
TTN ENSG00000155657 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin 7273
RHOG ENSG00000177105 This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. The encoded protein facilitates translocation of a functional guanine nucleotide exchange factor (GEF) complex from the cytoplasm to the plasma membrane where ras-related C3 botulinum toxin substrate 1 is activated to promote lamellipodium formation and cell migration. Two related pseudogene have been identified on chromosomes 20 and X. ras homolog family member G 391
SPRR3 ENSG00000163209 NA small proline rich protein 3 6707
SPI1 ENSG00000066336 This gene encodes an ETS-domain transcription factor that activates gene expression during myeloid and B-lymphoid cell development. The nuclear protein binds to a purine-rich sequence known as the PU-box found near the promoters of target genes, and regulates their expression in coordination with other transcription factors and cofactors. The protein can also regulate alternative splicing of target genes. Multiple transcript variants encoding different isoforms have been found for this gene. Spi-1 proto-oncogene 6688
NCF4 ENSG00000100365 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. neutrophil cytosolic factor 4 4689
CSTB ENSG00000160213 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). cystatin B 1476
ALDOA ENSG00000149925 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. aldolase, fructose-bisphosphate A 226
KRT7 ENSG00000135480 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. keratin 7 3855
A2M ENSG00000175899 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. alpha-2-macroglobulin 2
HP ENSG00000257017 This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. haptoglobin 3240
GP2 ENSG00000169347 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. glycoprotein 2 2813
ATG16L2 ENSG00000168010 NA autophagy related 16 like 2 89849
TALDO1 ENSG00000177156 Transaldolase 1 is a key enzyme of the nonoxidative pentose phosphate pathway providing ribose-5-phosphate for nucleic acid synthesis and NADPH for lipid biosynthesis. This pathway can also maintain glutathione at a reduced state and thus protect sulfhydryl groups and cellular integrity from oxygen radicals. The functional gene of transaldolase 1 is located on chromosome 11 and a pseudogene is identified on chromosome 1 but there are conflicting map locations. The second and third exon of this gene were developed by insertion of a retrotransposable element. This gene is thought to be involved in multiple sclerosis. transaldolase 1 6888
SELPLG ENSG00000110876 This gene encodes a glycoprotein that functions as a high affinity counter-receptor for the cell adhesion molecules P-, E- and L- selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins. This protein requires two post-translational modifications, tyrosine sulfation and the addition of the sialyl Lewis x tetrasaccharide (sLex) to its O-linked glycans, for its high-affinity binding activity. Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response. Alternate splicing results in multiple transcript variants. selectin P ligand 6404
PYGM ENSG00000068976 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. phosphorylase, glycogen, muscle 5837
MXD1 ENSG00000059728 This gene encodes a member of the MYC/MAX/MAD network of basic helix-loop-helix leucine zipper transcription factors. The MYC/MAX/MAD transcription factors mediate cellular proliferation, differentiation and apoptosis. The encoded protein antagonizes MYC-mediated transcriptional activation of target genes by competing for the binding partner MAX and recruiting repressor complexes containing histone deacetylases. Mutations in this gene may play a role in acute leukemia, and the encoded protein is a potential tumor suppressor. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. MAX dimerization protein 1 4084
REG1B ENSG00000172023 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 beta 5968
PACSIN3 ENSG00000165912 This gene is a member of the protein kinase C and casein kinase substrate in neurons family. The encoded protein is involved in linking the actin cytoskeleton with vesicle formation. Alternative splicing results in multiple transcript variants. protein kinase C and casein kinase substrate in neurons 3 29763
GPSM3 ENSG00000213654 NA G-protein signaling modulator 3 63940
RBM38 ENSG00000132819 NA RNA binding motif protein 38 55544
ATP1A1 ENSG00000163399 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 1 subunit. Multiple transcript variants encoding different isoforms have been found for this gene. ATPase Na+/K+ transporting subunit alpha 1 476
ATP2A2 ENSG00000174437 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 488
TGM2 ENSG00000198959 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. transglutaminase 2 7052
TYROBP ENSG00000011600 This gene encodes a transmembrane signaling polypeptide which contains an immunoreceptor tyrosine-based activation motif (ITAM) in its cytoplasmic domain. The encoded protein may associate with the killer-cell inhibitory receptor (KIR) family of membrane glycoproteins and may act as an activating signal transduction element. This protein may bind zeta-chain (TCR) associated protein kinase 70kDa (ZAP-70) and spleen tyrosine kinase (SYK) and play a role in signal transduction, bone modeling, brain myelination, and inflammation. Mutations within this gene have been associated with polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy (PLOSL), also known as Nasu-Hakola disease. Its putative receptor, triggering receptor expressed on myeloid cells 2 (TREM2), also causes PLOSL. Multiple alternative transcript variants encoding distinct isoforms have been identified for this gene. TYRO protein tyrosine kinase binding protein 7305
RAB10 ENSG00000084733 RAB10 belongs to the RAS (see HRAS; MIM 190020) superfamily of small GTPases. RAB proteins localize to exocytic and endocytic compartments and regulate intracellular vesicle trafficking (Bao et al., 1998 [PubMed 9918381]). RAB10, member RAS oncogene family 10890
SPINK1 ENSG00000164266 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. serine peptidase inhibitor, Kazal type 1 6690
PI3 ENSG00000124102 This gene encodes an elastase-specific inhibitor that functions as an antimicrobial peptide against Gram-positive and Gram-negative bacteria, and fungal pathogens. The protein contains a WAP-type four-disulfide core (WFDC) domain, and is thus a member of the WFDC domain family. Most WFDC gene members are localized to chromosome 20q12-q13 in two clusters: centromeric and telomeric. This gene belongs to the centromeric cluster. Expression of this gene is upgulated by bacterial lipopolysaccharides and cytokines. peptidase inhibitor 3 5266
CYP17A1 ENSG00000148795 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. cytochrome P450 family 17 subfamily A member 1 1586
HSPB1 ENSG00000106211 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). heat shock protein family B (small) member 1 3315
UNC13D ENSG00000092929 This gene encodes a protein that is a member of the UNC13 family, containing similar domain structure as other family members but lacking an N-terminal phorbol ester-binding C1 domain present in other Munc13 proteins. The protein appears to play a role in vesicle maturation during exocytosis and is involved in regulation of cytolytic granules secretion. Mutations in this gene are associated with familial hemophagocytic lymphohistiocytosis type 3, a genetically heterogeneous, rare autosomal recessive disorder. unc-13 homolog D 201294
MYBPC1 ENSG00000196091 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. myosin binding protein C, slow type 4604
MYL3 ENSG00000160808 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 3 4634
PRSS1 ENSG00000204983 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. protease, serine 1 5644
HCK ENSG00000101336 The protein encoded by this gene is a member of the Src family of tyrosine kinases. This protein is primarily hemopoietic, particularly in cells of the myeloid and B-lymphoid lineages. It may help couple the Fc receptor to the activation of the respiratory burst. In addition, it may play a role in neutrophil migration and in the degranulation of neutrophils. Multiple isoforms with different subcellular distributions are produced due to both alternative splicing and the use of alternative translation initiation codons, including a non-AUG (CUG) codon. HCK proto-oncogene, Src family tyrosine kinase 3055
ABTB1 ENSG00000114626 This gene encodes a protein with an ankyrin repeat region and two BTB/POZ domains, which are thought to be involved in protein-protein interactions. Expression of this gene is activated by the phosphatase and tensin homolog, a tumor suppressor. Alternate splicing results in three transcript variants. ankyrin repeat and BTB domain containing 1 80325
CPA1 ENSG00000091704 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. carboxypeptidase A1 1357
GPX3 ENSG00000211445 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 2878
PTGDS ENSG00000107317 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. prostaglandin D2 synthase 5730
SYNM ENSG00000182253 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. synemin 23336
NEB ENSG00000183091 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. nebulin 4703
TNNI1 ENSG00000159173 Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. troponin I1, slow skeletal type 7135
CRNN ENSG00000143536 This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. cornulin 49860
CRIM1 ENSG00000150938 This gene encodes a transmembrane protein containing six cysteine-rich repeat domains and an insulin-like growth factor-binding domain. The encoded protein may play a role in tissue development though interactions with members of the transforming growth factor beta family, such as bone morphogenetic proteins. cysteine rich transmembrane BMP regulator 1 (chordin-like) 51232
CELA3A ENSG00000142789 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. chymotrypsin like elastase family member 3A 10136
APP ENSG00000142192 This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene. amyloid beta precursor protein 351
FGA ENSG00000171560 This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. fibrinogen alpha chain 2243
SYNPO2 ENSG00000172403 NA synaptopodin 2 171024
HADHA ENSG00000084754 This gene encodes the alpha subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the alpha subunit catalyzing the 3-hydroxyacyl-CoA dehydrogenase and enoyl-CoA hydratase activities. Mutations in this gene result in trifunctional protein deficiency or LCHAD deficiency. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation. hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), alpha subunit 3030
CPA2 ENSG00000158516 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. carboxypeptidase A2 1358
GFPT1 ENSG00000198380 This gene encodes the first and rate-limiting enzyme of the hexosamine pathway and controls the flux of glucose into the hexosamine pathway. The product of this gene catalyzes the formation of glucosamine 6-phosphate. glutamine–fructose-6-phosphate transaminase 1 2673
KRT15 ENSG00000171346 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. keratin 15 3866
MB ENSG00000198125 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. myoglobin 4151
FCER1G ENSG00000158869 The high affinity IgE receptor is a key molecule involved in allergic reactions. It is a tetramer composed of 1 alpha, 1 beta, and 2 gamma chains. The gamma chains are also subunits of other Fc receptors. Fc fragment of IgE receptor Ig 2207
YBX3 ENSG00000060138 NA Y-box binding protein 3 8531
ARHGAP9 ENSG00000123329 This gene encodes a member of the Rho-GAP family of GTPase activating proteins. The protein has substantial GAP activity towards several Rho-family GTPases in vitro, converting them to an inactive GDP-bound state. It is implicated in regulating adhesion of hematopoietic cells to the extracellular matrix. Multiple transcript variants encoding different isoforms have been found for this gene. Rho GTPase activating protein 9 64333
MYOZ1 ENSG00000177791 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. myozenin 1 58529
CSK ENSG00000103653 NA c-src tyrosine kinase 1445
PTPRG ENSG00000144724 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP possesses an extracellular region, a single transmembrane region, and two tandem intracytoplasmic catalytic domains, and thus represents a receptor-type PTP. The extracellular region of this PTP contains a carbonic anhydrase-like (CAH) domain, which is also found in the extracellular region of PTPRBETA/ZETA. This gene is located in a chromosomal region that is frequently deleted in renal cell carcinoma and lung carcinoma, thus is thought to be a candidate tumor suppressor gene. protein tyrosine phosphatase, receptor type G 5793
PRSS3 ENSG00000010438 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. protease, serine 3 5646
ACSL1 ENSG00000151726 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. acyl-CoA synthetase long-chain family member 1 2180
B3GNT8 ENSG00000177191 NA UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 8 374907
ARHGAP27 ENSG00000159314 This gene encodes a member of a large family of proteins that activate Rho-type guanosine triphosphate (GTP) metabolizing enzymes. The encoded protein may pay a role in clathrin-mediated endocytosis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. Rho GTPase activating protein 27 201176
ADCK3 ENSG00000163050 This gene encodes a mitochondrial protein similar to yeast ABC1, which functions in an electron-transferring membrane protein complex in the respiratory chain. It is not related to the family of ABC transporter proteins. Expression of this gene is induced by the tumor suppressor p53 and in response to DNA damage, and inhibiting its expression partially suppresses p53-induced apoptosis. Alternatively spliced transcript variants have been found; however, their full-length nature has not been determined. aarF domain containing kinase 3 56997
MYH6 ENSG00000197616 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha 4624
IL1R2 ENSG00000115590 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. Interleukin 4 (IL4) is reported to antagonize the activity of interleukin 1 by inducing the expression and release of this cytokine. This gene and three other genes form a cytokine receptor gene cluster on chromosome 2q12. Alternative splicing results in multiple transcript variants and protein isoforms. Alternative splicing produces both membrane-bound and soluble proteins. A soluble protein is also produced by proteolytic cleavage. interleukin 1 receptor type 2 7850
TG ENSG00000042832 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin 7038
ADIRF ENSG00000148671 APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. adipogenesis regulatory factor 10974
RAB5B ENSG00000111540 NA RAB5B, member RAS oncogene family 5869
MMP25 ENSG00000008516 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMPs are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the protein encoded by this gene is a member of the membrane-type MMP (MT-MMP) subfamily, attached to the plasma membrane via a glycosylphosphatidyl inositol anchor. In response to bacterial infection or inflammation, the encoded protein is thought to inactivate alpha-1 proteinase inhibitor, a major tissue protectant against proteolytic enzymes released by activated neutrophils, facilitating the transendothelial migration of neutrophils to inflammatory sites. The encoded protein may also play a role in tumor invasion and metastasis through activation of MMP2. The gene has previously been referred to as MMP20 but has been renamed MMP25. matrix metallopeptidase 25 64386
NLRX1 ENSG00000160703 The protein encoded by this gene is a member of the NLR family and localizes to the outer mitochondrial membrane. The encoded protein is a regulator of mitochondrial antivirus responses. Three transcript variants encoding the same protein have been found for this gene. NLR family member X1 79671
EPAS1 ENSG00000116016 This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. endothelial PAS domain protein 1 2034
HK3 ENSG00000160883 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. hexokinase 3 3101
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
query X_id name summary symbol
ENSG00000042832 7038 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. TG
ENSG00000245532 283131 nuclear paraspeckle assembly transcript 1 (non-protein coding) This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. NEAT1
ENSG00000115705 7173 thyroid peroxidase This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. TPO
ENSG00000125618 7849 paired box 8 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. PAX8
ENSG00000168878 6439 surfactant protein B This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. SFTPB
ENSG00000070756 26986 poly(A) binding protein cytoplasmic 1 This gene encodes a poly(A) binding protein. The protein shuttles between the nucleus and cytoplasm and binds to the 3’ poly(A) tail of eukaryotic messenger RNAs via RNA-recognition motifs. The binding of this protein to poly(A) promotes ribosome recruitment and translation initiation; it is also required for poly(A) shortening which is the first step in mRNA decay. The gene is part of a small gene family including three protein-coding genes and several pseudogenes. PABPC1
ENSG00000186081 3852 keratin 5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT5
ENSG00000135480 3855 keratin 7 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. KRT7
ENSG00000169031 1285 collagen type IV alpha 3 chain Type IV collagen, the major structural component of basement membranes, is a multimeric protein composed of 3 alpha subunits. These subunits are encoded by 6 different genes, alpha 1 through alpha 6, each of which can form a triple helix structure with 2 other subunits to form type IV collagen. This gene encodes alpha 3. In the Goodpasture syndrome, autoantibodies bind to the collagen molecules in the basement membranes of alveoli and glomeruli. The epitopes that elicit these autoantibodies are localized largely to the non-collagenous C-terminal domain of the protein. A specific kinase phosphorylates amino acids in this same C-terminal region and the expression of this kinase is upregulated during pathogenesis. This gene is also linked to an autosomal recessive form of Alport syndrome. The mutations contributing to this syndrome are also located within the exons that encode this C-terminal region. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. COL4A3
ENSG00000175899 2 alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. A2M
ENSG00000155657 7273 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN
ENSG00000128016 7538 ZFP36 ring finger protein NA ZFP36
ENSG00000104879 1158 creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. CKM
ENSG00000168743 255743 nephronectin NA NPNT
ENSG00000081052 1286 collagen type IV alpha 4 chain This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR. COL4A4
ENSG00000101670 9388 lipase G, endothelial type The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. LIPG
ENSG00000197616 4624 myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6
ENSG00000185133 27124 inositol polyphosphate-5-phosphatase J NA INPP5J
ENSG00000156508 1915 eukaryotic translation elongation factor 1 alpha 1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. EEF1A1
ENSG00000075624 60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB
ENSG00000224078 ENSG00000224078 small nucleolar RNA host gene 14 NA SNHG14
ENSG00000153002 1360 carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. CPB1
ENSG00000106366 5054 serpin family E member 1 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. SERPINE1
ENSG00000134020 157310 phosphatidylethanolamine binding protein 4 The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]). PEBP4
ENSG00000164309 202333 cardiomyopathy associated 5 NA CMYA5
ENSG00000119681 4053 latent transforming growth factor beta binding protein 2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. LTBP2
ENSG00000092841 4637 myosin light chain 6 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. MYL6
ENSG00000205517 57139 ral guanine nucleotide dissociation stimulator like 3 NA RGL3
ENSG00000197766 1675 complement factor D This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. CFD
ENSG00000147459 80005 dedicator of cytokinesis 5 NA DOCK5
ENSG00000164692 1278 collagen type I alpha 2 chain This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A2
ENSG00000187266 2057 erythropoietin receptor This gene encodes the erythropoietin receptor which is a member of the cytokine receptor family. Upon erythropoietin binding, this receptor activates Jak2 tyrosine kinase which activates different intracellular pathways including: Ras/MAP kinase, phosphatidylinositol 3-kinase and STAT transcription factors. The stimulated erythropoietin receptor appears to have a role in erythroid cell survival. Defects in the erythropoietin receptor may produce erythroleukemia and familial erythrocytosis. Dysregulation of this gene may affect the growth of certain tumors. Alternate splicing results in multiple transcript variants. EPOR
ENSG00000109846 1410 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB
ENSG00000050767 91522 collagen type XXIII alpha 1 chain COL23A1 is a member of the transmembrane collagens, a subfamily of the nonfibrillar collagens that contain a single pass hydrophobic transmembrane domain (Banyard et al., 2003 [PubMed 12644459]). COL23A1
ENSG00000205420 3853 keratin 6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT6A
ENSG00000119280 84886 chromosome 1 open reading frame 198 NA C1orf198
ENSG00000204305 177 advanced glycosylation end product-specific receptor The advanced glycosylation end product (AGE) receptor encoded by this gene is a member of the immunoglobulin superfamily of cell surface receptors. It is a multiligand receptor, and besides AGE, interacts with other molecules implicated in homeostasis, development, and inflammation, and certain diseases, such as diabetes and Alzheimer’s disease. Many alternatively spliced transcript variants encoding different isoforms, as well as non-protein-coding variants, have been described for this gene (PMID:18089847). AGER
ENSG00000169347 2813 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. GP2
ENSG00000112936 730 complement component 7 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. C7
ENSG00000137393 255488 ring finger protein 144B NA RNF144B
ENSG00000091704 1357 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. CPA1
ENSG00000125148 4502 metallothionein 2A NA MT2A
ENSG00000167588 2819 glycerol-3-phosphate dehydrogenase 1 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. GPD1
ENSG00000124253 5105 phosphoenolpyruvate carboxykinase 1 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. PCK1
ENSG00000136153 4008 LIM domain 7 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. LMO7
ENSG00000005884 3675 integrin subunit alpha 3 The gene encodes a member of the integrin alpha chain family of proteins. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain that function as cell surface adhesion molecules. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 3 subunit. This subunit joins with a beta 1 subunit to form an integrin that interacts with extracellular matrix proteins including members of the laminin family. Expression of this gene may be correlated with breast cancer metastasis. ITGA3
ENSG00000130600 283120 H19, imprinted maternally expressed transcript (non-protein coding) This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. H19
ENSG00000100316 6122 ribosomal protein L3 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPL3
ENSG00000174807 57124 CD248 molecule NA CD248
ENSG00000138207 5950 retinol binding protein 4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. RBP4
ENSG00000125780 7053 transglutaminase 3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. TGM3
ENSG00000077522 88 actinin alpha 2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ACTN2
ENSG00000116016 2034 endothelial PAS domain protein 1 This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. EPAS1
ENSG00000049540 2006 elastin This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ELN
ENSG00000088836 83959 solute carrier family 4 member 11 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. SLC4A11
ENSG00000008513 6482 ST3 beta-galactoside alpha-2,3-sialyltransferase 1 The protein encoded by this gene is a type II membrane protein that catalyzes the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. The encoded protein is normally found in the Golgi but can be proteolytically processed to a soluble form. Correct glycosylation of the encoded protein may be critical to its sialyltransferase activity. This protein, which is a member of glycosyltransferase family 29, can use the same acceptor substrates as does sialyltransferase 4B. Two transcript variants encoding the same protein have been found for this gene. Other transcript variants may exist, but have not been fully characterized yet. ST3GAL1
ENSG00000140416 7168 tropomyosin 1 (alpha) This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. TPM1
ENSG00000107796 59 actin, alpha 2, smooth muscle, aorta The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2
ENSG00000167549 84940 coronin 6 NA CORO6
ENSG00000171401 3860 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13
ENSG00000158106 114822 rhophilin, Rho GTPase binding protein 1 NA RHPN1
ENSG00000204983 5644 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1
ENSG00000115386 5967 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A
ENSG00000137857 53905 dual oxidase 1 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. DUOX1
ENSG00000159251 70 actin, alpha, cardiac muscle 1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ACTC1
ENSG00000163631 213 albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ALB
ENSG00000160307 6285 S100 calcium binding protein B The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. S100B
ENSG00000142937 6202 ribosomal protein S8 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS8
ENSG00000054690 57475 pleckstrin homology, MyTH4 and FERM domain containing H1 NA PLEKHH1
ENSG00000142789 10136 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A
ENSG00000211445 2878 glutathione peroxidase 3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. GPX3
ENSG00000112306 6206 ribosomal protein S12 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S12E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal cancers compared to matched normal colonic mucosa has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS12
ENSG00000180139 ENSG00000180139 ACTA2 antisense RNA 1 NA ACTA2-AS1
ENSG00000112715 7422 vascular endothelial growth factor A This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. VEGFA
ENSG00000187244 4059 basal cell adhesion molecule (Lutheran blood group) This gene encodes Lutheran blood group glycoprotein, a member of the immunoglobulin superfamily and a receptor for the extracellular matrix protein, laminin. The protein contains five extracellular immunoglobulin domains, a single transmembrane domain, and a short C-terminal cytoplasmic tail. This protein may play a role in epithelial cell cancer and in vaso-occlusion of red blood cells in sickle cell disease. Polymorphisms in this gene define some of the antigens in the Lutheran system and also the Auberger system. Inactivating variants of this gene result in the recessive Lutheran null phenotype, Lu(a-b-), of the Lutheran blood group. Two transcript variants encoding different isoforms have been found for this gene. BCAM
ENSG00000175084 1674 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. DES
ENSG00000197893 4892 nebulin related anchoring protein NA NRAP
ENSG00000124466 27076 LY6/PLAUR domain containing 3 NA LYPD3
ENSG00000085063 966 CD59 molecule This gene encodes a cell surface glycoprotein that regulates complement-mediated cell lysis, and it is involved in lymphocyte signal transduction. This protein is a potent inhibitor of the complement membrane attack complex, whereby it binds complement C8 and/or C9 during the assembly of this complex, thereby inhibiting the incorporation of multiple copies of C9 into the complex, which is necessary for osmolytic pore formation. This protein also plays a role in signal transduction pathways in the activation of T cells. Mutations in this gene cause CD59 deficiency, a disease resulting in hemolytic anemia and thrombosis, and which causes cerebral infarction. Multiple alternatively spliced transcript variants, which encode the same protein, have been identified for this gene. CD59
ENSG00000173432 6288 serum amyloid A1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. SAA1
ENSG00000133030 23164 myosin phosphatase Rho interacting protein NA MPRIP
ENSG00000117318 3399 inhibitor of DNA binding 3, HLH protein The protein encoded by this gene is a helix-loop-helix (HLH) protein that can form heterodimers with other HLH proteins. However, the encoded protein lacks a basic DNA-binding domain and therefore inhibits the DNA binding of any HLH protein with which it interacts. ID3
ENSG00000166348 159195 ubiquitin specific peptidase 54 NA USP54
ENSG00000115112 29842 transcription factor CP2-like 1 NA TFCP2L1
ENSG00000105270 25999 CAP-Gly domain containing linker protein 3 This gene encodes a member of the cytoplasmic linker protein 170 family. Members of this protein family contain a cytoskeleton-associated protein glycine-rich domain and mediate the interaction of microtubules with cellular organelles. The encoded protein plays a role in T cell apoptosis by facilitating the association of tubulin and the lipid raft ganglioside GD3. The encoded protein also functions as a scaffold protein mediating membrane localization of phosphorylated protein kinase B. Alternatively spliced transcript variants have been observed for this gene. CLIP3
ENSG00000177600 6181 ribosomal protein lateral stalk subunit P2 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P1. The P2 protein can interact with P0 and P1 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPLP2
ENSG00000197905 7004 TEA domain transcription factor 4 This gene product is a member of the transcriptional enhancer factor (TEF) family of transcription factors, which contain the TEA/ATTS DNA-binding domain. It is preferentially expressed in the skeletal muscle, and binds to the M-CAT regulatory element found in promoters of muscle-specific genes to direct their gene expression. Alternatively spliced transcripts encoding distinct isoforms, some of which are translated through the use of a non-AUG (UUG) initiation codon, have been described for this gene. TEAD4
ENSG00000131471 8639 amine oxidase, copper containing 3 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. AOC3
ENSG00000179218 811 calreticulin Calreticulin is a multifunctional protein that acts as a major Ca(2+)-binding (storage) protein in the lumen of the endoplasmic reticulum. It is also found in the nucleus, suggesting that it may have a role in transcription regulation. Calreticulin binds to the synthetic peptide KLGFFKR, which is almost identical to an amino acid sequence in the DNA-binding domain of the superfamily of nuclear receptors. Calreticulin binds to antibodies in certain sera of systemic lupus and Sjogren patients which contain anti-Ro/SSA antibodies, it is highly conserved among species, and it is located in the endoplasmic and sarcoplasmic reticulum where it may bind calcium. The amino terminus of calreticulin interacts with the DNA-binding domain of the glucocorticoid receptor and prevents the receptor from binding to its specific glucocorticoid response element. Calreticulin can inhibit the binding of androgen receptor to its hormone-responsive DNA element and can inhibit androgen receptor and retinoic acid receptor transcriptional activities in vivo, as well as retinoic acid-induced neuronal differentiation. Thus, calreticulin can act as an important modulator of the regulation of gene transcription by nuclear hormone receptors. Systemic lupus erythematosus is associated with increased autoantibody titers against calreticulin but calreticulin is not a Ro/SS-A antigen. Earlier papers referred to calreticulin as an Ro/SS-A antigen but this was later disproven. Increased autoantibody titer against human calreticulin is found in infants with complete congenital heart block of both the IgG and IgM classes. CALR
ENSG00000035664 23604 death associated protein kinase 2 This gene encodes a protein that belongs to the serine/threonine protein kinase family. This protein contains a N-terminal protein kinase domain followed by a conserved calmodulin-binding domain with significant similarity to that of death-associated protein kinase 1 (DAPK1), a positive regulator of programmed cell death. Overexpression of this gene was shown to induce cell apoptosis. It uses multiple polyadenylation sites. DAPK2
ENSG00000136603 6498 SKI-like proto-oncogene The protein encoded by this gene is a component of the SMAD pathway, which regulates cell growth and differentiation through transforming growth factor-beta (TGFB). In the absence of ligand, the encoded protein binds to the promoter region of TGFB-responsive genes and recruits a nuclear repressor complex. TGFB signaling causes SMAD3 to enter the nucleus and degrade this protein, allowing these genes to be activated. Four transcript variants encoding three different isoforms have been found for this gene. SKIL
ENSG00000196091 4604 myosin binding protein C, slow type This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. MYBPC1
ENSG00000170477 3851 keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT4
ENSG00000211896 ENSG00000211896 immunoglobulin heavy constant gamma 1 (G1m marker) NA IGHG1
ENSG00000157240 8321 frizzled class receptor 1 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD1 protein contains a signal peptide, a cysteine-rich domain in the N-terminal extracellular region, 7 transmembrane domains, and a C-terminal PDZ domain-binding motif. The FZD1 transcript is expressed in various tissues. FZD1
ENSG00000143819 2052 epoxide hydrolase 1 Epoxide hydrolase is a critical biotransformation enzyme that converts epoxides from the degradation of aromatic compounds to trans-dihydrodiols which can be conjugated and excreted from the body. Epoxide hydrolase functions in both the activation and detoxification of epoxides. Mutations in this gene cause preeclampsia, epoxide hydrolase deficiency or increased epoxide hydrolase activity. Alternatively spliced transcript variants encoding the same protein have been found for this gene. EPHX1
ENSG00000170017 214 activated leukocyte cell adhesion molecule This gene encodes activated leukocyte cell adhesion molecule (ALCAM), also known as CD166 (cluster of differentiation 166), which is a member of a subfamily of immunoglobulin receptors with five immunoglobulin-like domains (VVC2C2C2) in the extracellular domain. This protein binds to T-cell differentiation antigene CD6, and is implicated in the processes of cell adhesion and migration. Multiple alternatively spliced transcript variants encoding different isoforms have been found. ALCAM
ENSG00000197119 123096 solute carrier family 25 member 29 This gene encodes a nuclear-encoded mitochondrial protein that is a member of the large family of solute carrier family 25 (SLC25) mitochondrial transporters. The members of this superfamily are involved in numerous metabolic pathways and cell functions. This gene product was previously reported to be a mitochondrial carnitine-acylcarnitine-like (CACL) translocase (PMID:128829710) or an ornithine transporter (designated ORNT3, PMID:19287344), however, a recent study characterized the main role of this protein as a mitochondrial transporter of basic amino acids, with a preference for arginine and lysine (PMID:24652292). Alternatively spliced transcript variants have been found for this gene. SLC25A29
ENSG00000198959 7052 transglutaminase 2 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. TGM2
ENSG00000142871 3491 cysteine rich angiogenic inducer 61 The secreted protein encoded by this gene is growth factor-inducible and promotes the adhesion of endothelial cells. The encoded protein interacts with several integrins and with heparan sulfate proteoglycan. This protein also plays a role in cell proliferation, differentiation, angiogenesis, apoptosis, and extracellular matrix formation. CYR61
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7 myosin, heavy chain 7, cardiac muscle, beta NA
NA ENSG00000211895 ENSG00000211895 IGHA1 immunoglobulin heavy constant alpha 1 NA
This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 ENSG00000198125 MB myoglobin NA
This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. 5284 ENSG00000162896 PIGR polymeric immunoglobulin receptor NA
NA ENSG00000211890 ENSG00000211890 IGHA2 immunoglobulin heavy constant alpha 2 (A2m marker) NA
This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. 6439 ENSG00000168878 SFTPB surfactant protein B NA
The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. 3880 ENSG00000171345 KRT19 keratin 19 NA
This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. 729238 ENSG00000185303 SFTPA2 surfactant protein A2 NA
This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the B-chain polypeptide of human complement subcomponent C1q 713 ENSG00000173369 C1QB complement component 1, q subcomponent, B chain NA
This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. 6035 ENSG00000129538 RNASE1 ribonuclease A family member 1, pancreatic NA
Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 ENSG00000111245 MYL2 myosin light chain 2 NA
Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557 ENSG00000173991 TCAP titin-cap NA
This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. A deficiency in C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N-terminus, and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the C-chain polypeptide of human complement subcomponent C1q. Alternatively spliced transcript variants that encode the same protein have been found for this gene. 714 ENSG00000159189 C1QC complement component 1, q subcomponent, C chain NA
This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. 653509 ENSG00000122852 SFTPA1 surfactant protein A1 NA
This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. 6319 ENSG00000099194 SCD stearoyl-CoA desaturase NA
This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. 6440 ENSG00000168484 SFTPC surfactant protein C NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L13E family of ribosomal proteins. It is located in the cytoplasm. This gene is expressed at significantly higher levels in benign breast lesions than in breast carcinomas. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6137 ENSG00000167526 RPL13 ribosomal protein L13 NA
This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10 keratin 10 NA
The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. 3960 ENSG00000171747 LGALS4 galectin 4 NA
The protein encoded by this gene is one of several isozymes of carbonic anhydrase, which catalyzes reversible hydration of carbon dioxide. Defects in this enzyme are associated with osteopetrosis and renal tubular acidosis. Two transcript variants encoding different isoforms have been found for this gene. 760 ENSG00000104267 CA2 carbonic anhydrase 2 NA
The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP possesses an extracellular region, a single transmembrane region, and two tandem intracytoplasmic catalytic domains, and thus represents a receptor-type PTP. The extracellular region contains three Ig-like domains, and nine non-Ig like domains similar to that of neural-cell adhesion molecule. This PTP was shown to function in the regulation of epithelial cell-cell contacts at adherents junctions, as well as in the control of beta-catenin signaling. An increased expression level of this protein was found in the insulin-responsive tissue of obese, insulin-resistant individuals, and may contribute to the pathogenesis of insulin resistance. Two alternatively spliced transcript variants of this gene, which encode distinct proteins, have been reported. 5792 ENSG00000142949 PTPRF protein tyrosine phosphatase, receptor type F NA
The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 ENSG00000104879 CKM creatine kinase, M-type NA
Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 ENSG00000125730 C3 complement component 3 NA
The protein encoded by this gene is a member of the scavenger receptor cysteine-rich (SRCR) superfamily, and is exclusively expressed in monocytes and macrophages. It functions as an acute phase-regulated receptor involved in the clearance and endocytosis of hemoglobin/haptoglobin complexes by macrophages, and may thereby protect tissues from free hemoglobin-mediated oxidative damage. This protein may also function as an innate immune sensor for bacteria and inducer of local inflammation. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 9332 ENSG00000177575 CD163 CD163 molecule NA
This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 ENSG00000108821 COL1A1 collagen type I alpha 1 NA
This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 8991 ENSG00000143416 SELENBP1 selenium binding protein 1 NA
NA 8531 ENSG00000060138 YBX3 Y-box binding protein 3 NA
This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the A-chain polypeptide of human complement subcomponent C1q. 712 ENSG00000173372 C1QA complement component 1, q subcomponent, A chain NA
This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 ENSG00000198467 TPM2 tropomyosin 2 (beta) NA
The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. 960 ENSG00000026508 CD44 CD44 molecule (Indian blood group) NA
This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 ENSG00000128591 FLNC filamin C NA
NA NA ENSG00000090920 NA NA TRUE
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S13P family of ribosomal proteins. It is located in the cytoplasm. The gene product of the E. coli ortholog (ribosomal protein S13) is involved in the binding of fMet-tRNA, and thus, in the initiation of translation. This gene is an ortholog of mouse Ke3. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6222 ENSG00000231500 RPS18 ribosomal protein S18 NA
The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. This isozyme is highly expressed in uterus and spleen, and in trace amounts in normal brain, but has markedly increased levels in malignant gliomas. This gene functions in mediating fatty acid-induced glioma cell growth. Three transcript variants encoding two different isoforms have been found for this gene. 51703 ENSG00000197142 ACSL5 acyl-CoA synthetase long-chain family member 5 NA
This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 ENSG00000164692 COL1A2 collagen type I alpha 2 chain NA
This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. 7168 ENSG00000140416 TPM1 tropomyosin 1 (alpha) NA
The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 ENSG00000197249 SERPINA1 serpin family A member 1 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 ENSG00000167768 KRT1 keratin 1 NA
Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 ENSG00000114854 TNNC1 troponin C1, slow skeletal and cardiac type NA
This gene encodes a mitochondrial enzyme that catalyzes the conversion of oxaloacetate to phosphoenolpyruvate in the presence of guanosine triphosphate (GTP). A cytosolic form of this protein is encoded by a different gene and is the key enzyme of gluconeogenesis in the liver. Alternatively spliced transcript variants have been described. 5106 ENSG00000100889 PCK2 phosphoenolpyruvate carboxykinase 2, mitochondrial NA
NA ENSG00000237973 ENSG00000237973 MTCO1P12 MT-CO1 pseudogene 12 NA
This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. 2597 ENSG00000111640 GAPDH glyceraldehyde-3-phosphate dehydrogenase NA
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 ENSG00000133392 MYH11 myosin, heavy chain 11, smooth muscle NA
Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. The protein encoded by this gene is a third distinct plastin isoform, which is specifically expressed at high levels in the small intestine. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. A pseudogene of this gene is found on chromosome 11. 5357 ENSG00000120756 PLS1 plastin 1 NA
This gene is a member of the glutathione peroxidase family and encodes a selenium-dependent glutathione peroxidase that is one of two isoenzymes responsible for the majority of the glutathione-dependent hydrogen peroxide-reducing activity in the epithelium of the gastrointestinal tract. The protein encoded by this locus contains a selenocysteine (Sec) residue encoded by the UGA codon, which normally signals translation termination. Alternatively spliced transcript variants have been described. 2877 ENSG00000176153 GPX2 glutathione peroxidase 2 NA
This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. 4582 ENSG00000185499 MUC1 mucin 1, cell surface associated NA
The lethal (2) giant larvae protein of Drosophila plays a role in asymmetric cell division, epithelial cell polarity, and cell migration. This human gene encodes a protein similar to lethal (2) giant larvae of Drosophila. In fly, the protein’s ability to localize cell fate determinants is regulated by the atypical protein kinase C (aPKC). In human, this protein interacts with aPKC-containing complexes and is cortically localized in mitotic cells. Alternative splicing results in multiple transcript variants encoding different isoforms. 3993 ENSG00000073350 LLGL2 LLGL2, scribble cell polarity complex component NA
This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. The protein encoded by this gene is the major intestinal enzyme and functions in intestine drug clearance. Alternatively spliced transcript variants have been found for this gene. 8824 ENSG00000172831 CES2 carboxylesterase 2 NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 6188 ENSG00000149273 RPS3 ribosomal protein S3 NA
NA 79762 ENSG00000162817 C1orf115 chromosome 1 open reading frame 115 NA
NA ENSG00000249007 ENSG00000249007 RP11-510N19.5 NA NA
Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. 2192 ENSG00000077942 FBLN1 fibulin 1 NA
NA 151887 ENSG00000091986 CCDC80 coiled-coil domain containing 80 NA
Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 ENSG00000159251 ACTC1 actin, alpha, cardiac muscle 1 NA
NA 54502 ENSG00000163694 RBM47 RNA binding motif protein 47 NA
The diastrophic dysplasia sulfate transporter is a transmembrane glycoprotein implicated in the pathogenesis of several human chondrodysplasias. It apparently is critical in cartilage for sulfation of proteoglycans and matrix organization. 1836 ENSG00000155850 SLC26A2 solute carrier family 26 member 2 NA
NA 51599 ENSG00000105699 LSR lipolysis stimulated lipoprotein receptor NA
Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD5 protein is believed to be the receptor for the Wnt5A ligand. 7855 ENSG00000163251 FZD5 frizzled class receptor 5 NA
CNDP2, also known as tissue carnosinase and peptidase A (EC 3.4.13.18), is a nonspecific dipeptidase rather than a selective carnosinase (Teufel et al., 2003 [PubMed 12473676]). 55748 ENSG00000133313 CNDP2 CNDP dipeptidase 2 (metallopeptidase M20 family) NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 ENSG00000172867 KRT2 keratin 2 NA
NA 54884 ENSG00000042445 RETSAT retinol saturase NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19E family of ribosomal proteins. It is located in the cytoplasm. Mutations in this gene cause Diamond-Blackfan anemia (DBA), a constitutional erythroblastopenia characterized by absent or decreased erythroid precursors, in a subset of patients. This suggests a possible extra-ribosomal function for this gene in erythropoietic differentiation and proliferation, in addition to its ribosomal function. Higher expression levels of this gene in some primary colon carcinomas compared to matched normal colon tissues has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6223 ENSG00000105372 RPS19 ribosomal protein S19 NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88 ENSG00000077522 ACTN2 actinin alpha 2 NA
This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 ENSG00000155657 TTN titin NA
NA 202333 ENSG00000164309 CMYA5 cardiomyopathy associated 5 NA
NA 4495 ENSG00000125144 MT1G metallothionein 1G NA
The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. 9659 ENSG00000178104 PDE4DIP phosphodiesterase 4D interacting protein NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6202 ENSG00000142937 RPS8 ribosomal protein S8 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. 1577 ENSG00000106258 CYP3A5 cytochrome P450 family 3 subfamily A member 5 NA
The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. The specific function of this protein has not yet been determined; however, this protein may play a role in the transport of biliary and intestinal excretion of organic anions. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized. 8714 ENSG00000108846 ABCC3 ATP binding cassette subfamily C member 3 NA
This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. 9902 ENSG00000011028 MRC2 mannose receptor C type 2 NA
NA 140738 ENSG00000171227 TMEM37 transmembrane protein 37 NA
Cytochrome c oxidase (COX), the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 2 (heart/muscle isoform) of subunit VIa, and polypeptide 2 is present only in striated muscles. Polypeptide 1 (liver isoform) of subunit VIa is encoded by a different gene, and is found in all non-muscle tissues. These two polypeptides share 66% amino acid sequence identity. 1339 ENSG00000156885 COX6A2 cytochrome c oxidase subunit 6A2 NA
This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. 1294 ENSG00000114270 COL7A1 collagen type VII alpha 1 NA
The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. 2752 ENSG00000135821 GLUL glutamate-ammonia ligase NA
Metabolic N-oxidation of the diet-derived amino-trimethylamine (TMA) is mediated by flavin-containing monooxygenase and is subject to an inherited FMO3 polymorphism in man resulting in a small subpopulation with reduced TMA N-oxidation capacity resulting in fish odor syndrome Trimethylaminuria. Three forms of the enzyme, FMO1 found in fetal liver, FMO2 found in adult liver, and FMO3 are encoded by genes clustered in the 1q23-q25 region. Flavin-containing monooxygenases are NADPH-dependent flavoenzymes that catalyzes the oxidation of soft nucleophilic heteroatom centers in drugs, pesticides, and xenobiotics. Alternative splicing results in multiple transcript variants. 2330 ENSG00000131781 FMO5 flavin containing monooxygenase 5 NA
The cytoplasmic peripheral membrane protein encoded by this gene functions as a protein-tyrosine kinase substrate in microvilli. As a member of the ERM protein family, this protein serves as an intermediate between the plasma membrane and the actin cytoskeleton. This protein plays a key role in cell surface structure adhesion, migration and organization, and it has been implicated in various human cancers. A pseudogene located on chromosome 3 has been identified for this gene. Alternatively spliced variants have also been described for this gene. 7430 ENSG00000092820 EZR ezrin NA
This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. 1292 ENSG00000142173 COL6A2 collagen type VI alpha 2 NA
This gene encodes a serine/threonine protein kinase that plays an important role in cellular stress response. This kinase activates certain potassium, sodium, and chloride channels, suggesting an involvement in the regulation of processes such as cell survival, neuronal excitability, and renal sodium excretion. High levels of expression of this gene may contribute to conditions such as hypertension and diabetic nephropathy. Several alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 6446 ENSG00000118515 SGK1 serum/glucocorticoid regulated kinase 1 NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L18E family of ribosomal proteins that is a component of the 60S subunit. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 6141 ENSG00000063177 RPL18 ribosomal protein L18 NA
NA ENSG00000211899 ENSG00000211899 IGHM immunoglobulin heavy constant mu NA
NA 64081 ENSG00000108187 PBLD phenazine biosynthesis like protein domain containing NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the S17P family of ribosomal proteins that is a component of the 40S subunit. This gene is co-transcribed with the small nucleolar RNA gene U35B, which is located in the third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome. 6205 ENSG00000142534 RPS11 ribosomal protein S11 NA
The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. 1113 ENSG00000100604 CHGA chromogranin A NA
The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. 27075 ENSG00000106537 TSPAN13 tetraspanin 13 NA
The protein encoded by this gene is an epithelial-derived, integral membrane serine protease. This protease forms a complex with the Kunitz-type serine protease inhibitor, HAI-1, and is found to be activated by sphingosine 1-phosphate. This protease has been shown to cleave and activate hepatocyte growth factor/scattering factor, and urokinase plasminogen activator, which suggest the function of this protease as an epithelial membrane activator for other proteases and latent growth factors. The expression of this protease has been associated with breast, colon, prostate, and ovarian tumors, which implicates its role in cancer invasion, and metastasis. 6768 ENSG00000149418 ST14 suppression of tumorigenicity 14 NA
This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. Differential expression of this gene has been observed in different types of malignancies, including breast cancer, ovarian cancer, hepatocellular carcinomas, urinary tumors, prostate cancer, lung cancer, head and neck cancers, thyroid carcinomas, etc.. Alternatively spliced transcript variants encoding different isoforms have been found. 1366 ENSG00000181885 CLDN7 claudin 7 NA
NA ENSG00000211675 ENSG00000211675 IGLC1 immunoglobulin lambda constant 1 (Mcg marker) NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L1P family of ribosomal proteins. It is located in the cytoplasm. The expression of this gene is downregulated in the thymus by cyclosporin-A (CsA), an immunosuppressive drug. Studies in mice have shown that the expression of the ribosomal protein L10a gene is downregulated in neural precursor cells during development. This gene previously was referred to as NEDD6 (neural precursor cell expressed, developmentally downregulated 6), but it has been renamed RPL10A (ribosomal protein 10a). As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 4736 ENSG00000198755 RPL10A ribosomal protein L10a NA
NA 6289 ENSG00000134339 SAA2 serum amyloid A2 NA
This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. 23166 ENSG00000010327 STAB1 stabilin 1 NA
This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. 100423062 ENSG00000254709 IGLL5 immunoglobulin lambda like polypeptide 5 NA
The uronate cycle functions as an alternative glucose metabolic pathway, accounting for about 5% of daily glucose catabolism. The product of this gene catalyzes the dehydrogenation of L-gulonate into dehydro-L-gulonate in the uronate cycle. The enzyme requires NAD(H) as a coenzyme, and is inhibited by inorganic phosphate. A similar gene in the rabbit is thought to serve a structural role in the lens of the eye. 51084 ENSG00000165475 CRYL1 crystallin lambda 1 NA
NA 25840 ENSG00000185432 METTL7A methyltransferase like 7A NA
This gene encodes a member of the peptidase family. The protein forms a homodimer that hydrolyzes dipeptides or tripeptides with C-terminal proline or hydroxyproline residues. The enzyme serves an important role in the recycling of proline, and may be rate limiting for the production of collagen. Mutations in this gene result in prolidase deficiency, which is characterized by the excretion of large amount of di- and tri-peptides containing proline. Multiple transcript variants encoding different isoforms have been found for this gene. 5184 ENSG00000124299 PEPD peptidase D NA
This gene encodes a member of the IQGAP family. The protein contains three IQ domains, one calponin homology domain, one Ras-GAP domain and one WW domain. It interacts with components of the cytoskeleton, with cell adhesion molecules, and with several signaling molecules to regulate cell morphology and motility. 10788 ENSG00000145703 IQGAP2 IQ motif containing GTPase activating protein 2 NA
This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. 10562 ENSG00000102837 OLFM4 olfactomedin 4 NA
NA 122786 ENSG00000139926 FRMD6 FERM domain containing 6 NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P2. The P1 protein can interact with P0 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Two alternatively spliced transcript variants that encode different proteins have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6176 ENSG00000137818 RPLP1 ribosomal protein lateral stalk subunit P1 NA
The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ENSG00000143632 ACTA1 actin, alpha 1, skeletal muscle NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
myelin basic protein 4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. MBP ENSG00000197971 NA
NA ENSG00000266844 NA RP11-862L9.3 ENSG00000266844 NA
glial fibrillary acidic protein 2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. GFAP ENSG00000131095 NA
alanyl aminopeptidase, membrane 290 Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. ANPEP ENSG00000166825 NA
keratin 13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13 ENSG00000171401 NA
maturin, neural progenitor differentiation regulator homolog (Xenopus) 222166 NA MTURN ENSG00000180354 NA
S100 calcium binding protein B 6285 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. S100B ENSG00000160307 NA
regulator of G-protein signaling 1 5996 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. RGS1 ENSG00000090104 NA
latent transforming growth factor beta binding protein 4 8425 The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. LTBP4 ENSG00000090006 NA
collagen type I alpha 1 1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A1 ENSG00000108821 NA
N-myc downstream regulated 1 10397 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NDRG1 ENSG00000104419 NA
keratin 4 3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT4 ENSG00000170477 NA
heat shock protein 90kDa alpha family class A member 1 3320 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. HSP90AA1 ENSG00000080824 NA
claudin domain containing 1 56650 NA CLDND1 ENSG00000080822 NA
pleckstrin homology domain containing B1 58473 NA PLEKHB1 ENSG00000021300 NA
collagen type I alpha 2 chain 1278 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A2 ENSG00000164692 NA
membrane metallo-endopeptidase 4311 This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. MME ENSG00000196549 NA
tumor protein p53 inducible nuclear protein 2 58476 NA TP53INP2 ENSG00000078804 NA
progestin and adipoQ receptor family member 6 79957 NA PAQR6 ENSG00000160781 NA
eukaryotic translation elongation factor 2 1938 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation. EEF2 ENSG00000167658 NA
laminin subunit alpha 5 3911 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). LAMA5 ENSG00000130702 NA
CD9 molecule 928 This gene encodes a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Tetraspanins are cell surface glycoproteins with four transmembrane domains that form multimeric complexes with other cell surface proteins. The encoded protein functions in many cellular processes including differentiation, adhesion, and signal transduction, and expression of this gene plays a critical role in the suppression of cancer cell motility and metastasis. CD9 ENSG00000010278 NA
prostaglandin D2 synthase 5730 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. PTGDS ENSG00000107317 NA
ZFP36 ring finger protein 7538 NA ZFP36 ENSG00000128016 NA
myelin protein zero 4359 This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. MPZ ENSG00000158887 NA
NA ENSG00000229732 NA AC019349.5 ENSG00000229732 NA
septin 4 5414 This gene is a member of the septin family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse, and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is highly expressed in brain and heart. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. One of the isoforms (known as ARTS) is distinct; it is localized to the mitochondria, and has a role in apoptosis and cancer. SEPT4 ENSG00000108387 NA
interleukin 1 receptor antagonist 3557 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. IL1RN ENSG00000136689 NA
cystatin A 1475 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. CSTA ENSG00000121552 NA
plakophilin 4 8502 Armadillo-like proteins are characterized by a series of armadillo repeats, first defined in the Drosophila ‘armadillo’ gene product, that are typically 42 to 45 amino acids in length. These proteins can be divided into subfamilies based on their number of repeats, their overall sequence similarity, and the dispersion of the repeats throughout their sequences. Members of the p120(ctn)/plakophilin subfamily of Armadillo-like proteins, including CTNND1, CTNND2, PKP1, PKP2, PKP4, and ARVCF. PKP4 may be a component of desmosomal plaque and other adhesion plaques and is thought to be involved in regulating junctional plaque organization and cadherin function. Multiple transcript variants encoding different isoforms have been found for this gene. PKP4 ENSG00000144283 NA
CD63 molecule 967 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The encoded protein is a cell surface glycoprotein that is known to complex with integrins. It may function as a blood platelet activation marker. Deficiency of this protein is associated with Hermansky-Pudlak syndrome. Also this gene has been associated with tumor progression. Alternative splicing results in multiple transcript variants encoding different protein isoforms. CD63 ENSG00000135404 NA
ankyrin repeat domain 1 27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ANKRD1 ENSG00000148677 NA
small proline rich protein 3 6707 NA SPRR3 ENSG00000163209 NA
pentraxin 3 5806 NA PTX3 ENSG00000163661 NA
endoplasmic reticulum-golgi intermediate compartment 1 57222 This gene encodes a cycling membrane protein which is an endoplasmic reticulum-golgi intermediate compartment (ERGIC) protein which interacts with other members of this protein family to increase their turnover. ERGIC1 ENSG00000113719 NA
S100 calcium binding protein A9 6280 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100A9 ENSG00000163220 NA
serpin family B member 9 5272 This gene encodes a member of the serine protease inhibitor family which are also known as serpins. The encoded protein belongs to a subfamily of intracellular serpins. This protein inhibits the activity of the effector molecule granzyme B. Overexpression of this protein may prevent cytotoxic T-lymphocytes from eliminating certain tumor cells. A pseudogene of this gene is found on chromosome 6. SERPINB9 ENSG00000170542 NA
obscurin-like 1 23363 Cytoskeletal adaptor proteins function in linking the internal cytoskeleton of cells to the cell membrane. This gene encodes a cytoskeletal adaptor protein, which is a member of the Unc-89/obscurin family. The protein contains multiple N- and C-terminal immunoglobulin (Ig)-like domains and a central fibronectin type 3 domain. Mutations in this gene cause 3M syndrome type 2. Alternatively spliced transcript variants encoding different isoforms have been found in this gene. OBSL1 ENSG00000124006 NA
StAR related lipid transfer domain containing 9 57519 NA STARD9 ENSG00000159433 NA
XIAP associated factor 1 54739 This gene encodes a protein which binds to and counteracts the inhibitory effect of a member of the IAP (inhibitor of apoptosis) protein family. IAP proteins bind to and inhibit caspases which are activated during apoptosis. The proportion of IAPs and proteins which interfere with their activity, such as the encoded protein, affect the progress of the apoptosis signaling pathway. Multiple transcript variants encoding different isoforms have been found for this gene. XAF1 ENSG00000132530 NA
basic helix-loop-helix family member e40 8553 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. BHLHE40 ENSG00000134107 NA
cystatin B 1476 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). CSTB ENSG00000160213 NA
hypoxia inducible lipid droplet associated 29923 NA HILPDA ENSG00000135245 NA
thioredoxin reductase 1 7296 This gene encodes a member of the family of pyridine nucleotide oxidoreductases. This protein reduces thioredoxins as well as other substrates, and plays a role in selenium metabolism and protection against oxidative stress. The functional enzyme is thought to be a homodimer which uses FAD as a cofactor. Each subunit contains a selenocysteine (Sec) residue which is required for catalytic activity. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenocysteine-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. Alternative splicing results in several transcript variants encoding the same or different isoforms. TXNRD1 ENSG00000198431 NA
transferrin 7018 This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. TF ENSG00000091513 NA
periaxin 57716 This gene encodes a protein involved in peripheral nerve myelin upkeep. The encoded protein contains 2 PDZ domains which were named after PSD95 (post synaptic density protein), DlgA (Drosophila disc large tumor suppressor), and ZO1 (a mammalian tight junction protein). Two alternatively spliced transcript variants have been described for this gene which encode different protein isoforms and which are targeted differently in the Schwann cell. Mutations in this gene cause Charcot-Marie-Tooth neuoropathy, type 4F and Dejerine-Sottas neuropathy. PRX ENSG00000105227 NA
semaphorin 4C 54910 NA SEMA4C ENSG00000168758 NA
IKAROS family zinc finger 2 22807 This gene encodes a member of the Ikaros family of zinc-finger proteins. Three members of this protein family (Ikaros, Aiolos and Helios) are hematopoietic-specific transcription factors involved in the regulation of lymphocyte development. This protein forms homo- or hetero-dimers with other Ikaros family members, and is thought to function predominantly in early hematopoietic development. Multiple transcript variants encoding different isoforms have been found for this gene, but the biological validity of some variants has not been determined. IKZF2 ENSG00000030419 NA
WNK lysine deficient protein kinase 1 65125 This gene encodes a member of the WNK subfamily of serine/threonine protein kinases. The encoded protein may be a key regulator of blood pressure by controlling the transport of sodium and chloride ions. Mutations in this gene have been associated with pseudohypoaldosteronism type II and hereditary sensory neuropathy type II. Alternatively spliced transcript variants encoding different isoforms have been described but the full-length nature of all of them has yet to be determined. WNK1 ENSG00000060237 NA
spectrin beta, non-erythrocytic 1 6711 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. SPTBN1 ENSG00000115306 NA
high density lipoprotein binding protein 3069 The protein encoded by this gene binds high density lipoprotein (HDL) and may function to regulate excess cholesterol levels in cells. The encoded protein also binds RNA and can induce heterochromatin formation. HDLBP ENSG00000115677 NA
myelin protein zero like 2 10205 Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. MPZL2 ENSG00000149573 NA
small ArfGAP2 64744 NA SMAP2 ENSG00000084070 NA
vimentin 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM ENSG00000026025 NA
arginine and glutamate rich 1 55082 NA ARGLU1 ENSG00000134884 NA
dystonin 667 This gene encodes a member of the plakin protein family of adhesion junction plaque proteins. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene, but the full-length nature of some variants has not been defined. It has been reported that some isoforms are expressed in neural and muscle tissue, anchoring neural intermediate filaments to the actin cytoskeleton, and some isoforms are expressed in epithelial tissue, anchoring keratin-containing intermediate filaments to hemidesmosomes. Consistent with the expression, mice defective for this gene show skin blistering and neurodegeneration. DST ENSG00000151914 NA
DYX1C1-CCPG1 readthrough (NMD candidate) 100533483 This locus represents naturally occurring read-through transcription between the neighboring dyslexia susceptibility 1 candidate 1 (DYX1C1) and cell cycle progression 1 (CCPG1) genes on chromosome 15. The read-through transcript is a candidate for nonsense-mediated mRNA decay (NMD), and is thus unlikely to produce a protein product. DYX1C1-CCPG1 ENSG00000261771 NA
2’-5’-oligoadenylate synthetase 3 4940 This gene encodes an enzyme included in the 2’, 5’ oligoadenylate synthase family. This enzyme is induced by interferons and catalyzes the 2’, 5’ oligomers of adenosine in order to bind and activate RNase L. This enzyme family plays a significant role in the inhibition of cellular protein synthesis and viral infection resistance. OAS3 ENSG00000111331 NA
brain protein I3 25798 NA BRI3 ENSG00000164713 NA
CDC like kinase 1 1195 This gene encodes a member of the CDC2-like (or LAMMER) family of dual specificity protein kinases. In the nucleus, the encoded protein phosphorylates serine/arginine-rich proteins involved in pre-mRNA processing, releasing them into the nucleoplasm. The choice of splice sites during pre-mRNA processing may be regulated by the concentration of transacting factors, including serine/arginine rich proteins. Therefore, the encoded protein may play an indirect role in governing splice site selection. Multiple transcript variants encoding different isoforms have been found for this gene. CLK1 ENSG00000013441 NA
titin 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN ENSG00000155657 NA
colony stimulating factor 3 receptor 1441 The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. CSF3R ENSG00000119535 NA
carboxypeptidase A1 1357 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. CPA1 ENSG00000091704 NA
von Willebrand factor A domain containing 1 64856 VWA1 belongs to the von Willebrand factor (VWF; MIM 613160) A (VWFA) domain superfamily of extracellular matrix proteins and appears to play a role in cartilage structure and function (Fitzgerald et al., 2002 [PubMed 12062410]). VWA1 ENSG00000179403 NA
heat shock protein family A (Hsp70) member 1B 3304 This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. HSPA1B ENSG00000204388 NA
prolyl 4-hydroxylase subunit beta 5034 This gene encodes the beta subunit of prolyl 4-hydroxylase, a highly abundant multifunctional enzyme that belongs to the protein disulfide isomerase family. When present as a tetramer consisting of two alpha and two beta subunits, this enzyme is involved in hydroxylation of prolyl residues in preprocollagen. This enzyme is also a disulfide isomerase containing two thioredoxin domains that catalyze the formation, breakage and rearrangement of disulfide bonds. Other known functions include its ability to act as a chaperone that inhibits aggregation of misfolded proteins in a concentration-dependent manner, its ability to bind thyroid hormone, its role in both the influx and efflux of S-nitrosothiol-bound nitric oxide, and its function as a subunit of the microsomal triglyceride transfer protein complex. P4HB ENSG00000185624 NA
cytochrome P450 family 17 subfamily A member 1 1586 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. CYP17A1 ENSG00000148795 NA
LRRC75A antisense RNA 1 125144 NA LRRC75A-AS1 ENSG00000175061 NA
LDL receptor related protein associated protein 1 4043 This gene encodes a protein that interacts with the low density lipoprotein (LDL) receptor-related protein and facilitates its proper folding and localization by preventing the binding of ligands. Mutations in this gene have been identified in individuals with myopia 23. Alternative splicing results in multiple transcript variants. LRPAP1 ENSG00000163956 NA
acyl-CoA synthetase long-chain family member 5 51703 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. This isozyme is highly expressed in uterus and spleen, and in trace amounts in normal brain, but has markedly increased levels in malignant gliomas. This gene functions in mediating fatty acid-induced glioma cell growth. Three transcript variants encoding two different isoforms have been found for this gene. ACSL5 ENSG00000197142 NA
cornulin 49860 This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. CRNN ENSG00000143536 NA
activating transcription factor 3 467 This gene encodes a member of the mammalian activation transcription factor/cAMP responsive element-binding (CREB) protein family of transcription factors. This gene is induced by a variety of signals, including many of those encountered by cancer cells, and is involved in the complex process of cellular stress response. Multiple transcript variants encoding different isoforms have been found for this gene. It is possible that alternative splicing of this gene may be physiologically important in the regulation of target genes. ATF3 ENSG00000162772 NA
QKI, KH domain containing, RNA binding 9444 The protein encoded by this gene is an RNA-binding protein that regulates pre-mRNA splicing, export of mRNAs from the nucleus, protein translation, and mRNA stability. The encoded protein is involved in myelinization and oligodendrocyte differentiation and may play a role in schizophrenia. Multiple transcript variants encoding different isoforms have been found for this gene. QKI ENSG00000112531 NA
regenerating family member 1 alpha 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A ENSG00000115386 NA
transforming growth factor beta induced 7045 This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. TGFBI ENSG00000120708 NA
protease, serine 1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1 ENSG00000204983 NA
heterogeneous nuclear ribonucleoprotein A3 220988 NA HNRNPA3 ENSG00000170144 NA
2’-5’-oligoadenylate synthetase 2 4939 This gene encodes a member of the 2-5A synthetase family, essential proteins involved in the innate immune response to viral infection. The encoded protein is induced by interferons and uses adenosine triphosphate in 2’-specific nucleotidyl transfer reactions to synthesize 2’,5’-oligoadenylates (2-5As). These molecules activate latent RNase L, which results in viral RNA degradation and the inhibition of viral replication. The three known members of this gene family are located in a cluster on chromosome 12. Alternatively spliced transcript variants encoding different isoforms have been described. OAS2 ENSG00000111335 NA
phosphatidylinositol-5-phosphate 4-kinase type 2 alpha 5305 Phosphatidylinositol-5,4-bisphosphate, the precursor to second messengers of the phosphoinositide signal transduction pathways, is thought to be involved in the regulation of secretion, cell proliferation, differentiation, and motility. The protein encoded by this gene is one of a family of enzymes capable of catalyzing the phosphorylation of phosphatidylinositol-5-phosphate on the fourth hydroxyl of the myo-inositol ring to form phosphatidylinositol-5,4-bisphosphate. The amino acid sequence of this enzyme does not show homology to other kinases, but the recombinant protein does exhibit kinase activity. This gene is a member of the phosphatidylinositol-5-phosphate 4-kinase family. PIP4K2A ENSG00000150867 NA
selectin P ligand 6404 This gene encodes a glycoprotein that functions as a high affinity counter-receptor for the cell adhesion molecules P-, E- and L- selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins. This protein requires two post-translational modifications, tyrosine sulfation and the addition of the sialyl Lewis x tetrasaccharide (sLex) to its O-linked glycans, for its high-affinity binding activity. Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response. Alternate splicing results in multiple transcript variants. SELPLG ENSG00000110876 NA
keratin 10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. KRT10 ENSG00000186395 NA
OS9, endoplasmic reticulum lectin 10956 This gene encodes a protein that is highly expressed in osteosarcomas. This protein binds to the hypoxia-inducible factor 1 (HIF-1), a key regulator of the hypoxic response and angiogenesis, and promotes the degradation of one of its subunits. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. OS9 ENSG00000135506 NA
lysophosphatidic acid receptor 6 10161 The protein encoded by this gene belongs to the family of G-protein coupled receptors, that are preferentially activated by adenosine and uridine nucleotides. This gene aligns with an internal intron of the retinoblastoma susceptibility gene in the reverse orientation. Alternative splicing results in multiple transcript variants. LPAR6 ENSG00000139679 NA
lipase F, gastric type 8513 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. LIPF ENSG00000182333 NA
TAR DNA binding protein 23435 HIV-1, the causative agent of acquired immunodeficiency syndrome (AIDS), contains an RNA genome that produces a chromosomally integrated DNA during the replicative cycle. Activation of HIV-1 gene expression by the transactivator Tat is dependent on an RNA regulatory element (TAR) located downstream of the transcription initiation site. The protein encoded by this gene is a transcriptional repressor that binds to chromosomally integrated TAR DNA and represses HIV-1 transcription. In addition, this protein regulates alternate splicing of the CFTR gene. A similar pseudogene is present on chromosome 20. TARDBP ENSG00000120948 NA
adipocyte plasma membrane associated protein 57136 NA APMAP ENSG00000101474 NA
phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 beta 5287 The protein encoded by this gene belongs to the phosphoinositide 3-kinase (PI3K) family. PI3-kinases play roles in signaling pathways involved in cell proliferation, oncogenic transformation, cell survival, cell migration, and intracellular protein trafficking. This protein contains a lipid kinase catalytic domain as well as a C-terminal C2 domain, a characteristic of class II PI3-kinases. C2 domains act as calcium-dependent phospholipid binding motifs that mediate translocation of proteins to membranes, and may also mediate protein-protein interactions. The PI3-kinase activity of this protein is sensitive to low nanomolar levels of the inhibitor wortmanin. The C2 domain of this protein was shown to bind phospholipids but not Ca2+, which suggests that this enzyme may function in a calcium-independent manner. PIK3C2B ENSG00000133056 NA
myeloid cell nuclear differentiation antigen 4332 The myeloid cell nuclear differentiation antigen (MNDA) is detected only in nuclei of cells of the granulocyte-monocyte lineage. A 200-amino acid region of human MNDA is strikingly similar to a region in the proteins encoded by a family of interferon-inducible mouse genes, designated Ifi-201, Ifi-202, and Ifi-203, that are not regulated in a cell- or tissue-specific fashion. The 1.8-kb MNDA mRNA, which contains an interferon-stimulated response element in the 5-prime untranslated region, was significantly upregulated in human monocytes exposed to interferon alpha. MNDA is located within 2,200 kb of FCER1A, APCS, CRP, and SPTA1. In its pattern of expression and/or regulation, MNDA resembles IFI16, suggesting that these genes participate in blood cell-specific responses to interferons. MNDA ENSG00000163563 NA
syndecan 1 6382 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. SDC1 ENSG00000115884 NA
chymotrypsin like elastase family member 3A 10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A ENSG00000142789 NA
ornithine decarboxylase antizyme 1 4946 The protein encoded by this gene belongs to the ornithine decarboxylase antizyme family, which plays a role in cell growth and proliferation by regulating intracellular polyamine levels. Expression of antizymes requires +1 ribosomal frameshifting, which is enhanced by high levels of polyamines. Antizymes in turn bind to and inhibit ornithine decarboxylase (ODC), the key enzyme in polyamine biosynthesis; thus, completing the auto-regulatory circuit. This gene encodes antizyme 1, the first member of the antizyme family, that has broad tissue distribution, and negatively regulates intracellular polyamine levels by binding to and targeting ODC for degradation, as well as inhibiting polyamine uptake. Antizyme 1 mRNA contains two potential in-frame AUGs; and studies in rat suggest that alternative use of the two translation initiation sites results in N-terminally distinct protein isoforms with different subcellular localization. Alternatively spliced transcript variants have also been noted for this gene. OAZ1 ENSG00000104904 NA
misshapen like kinase 1 50488 This gene encodes a serine/threonine kinase belonging to the germinal center kinase (GCK) family. The protein is structurally similar to the kinases that are related to NIK and may belong to a distinct subfamily of NIK-related kinases within the GCK family. Studies of the mouse homolog indicate an up-regulation of expression in the course of postnatal mouse cerebral development and activation of the cJun N-terminal kinase (JNK) and the p38 pathways. MINK1 ENSG00000141503 NA
AHNAK nucleoprotein 79026 NA AHNAK ENSG00000124942 NA
Rap guanine nucleotide exchange factor 5 9771 Members of the RAS (see HRAS; MIM 190020) subfamily of GTPases function in signal transduction as GTP/GDP-regulated switches that cycle between inactive GDP- and active GTP-bound states. Guanine nucleotide exchange factors (GEFs), such as RAPGEF5, serve as RAS activators by promoting acquisition of GTP to maintain the active GTP-bound state and are the key link between cell surface receptors and RAS activation (Rebhun et al., 2000 [PubMed 10934204]). RAPGEF5 ENSG00000136237 NA
intercellular adhesion molecule 3 3385 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is constitutively and abundantly expressed by all leucocytes and may be the most important ligand for LFA-1 in the initiation of the immune response. It functions not only as an adhesion molecule, but also as a potent signalling molecule. Alternative splicing results in multiple transcript variants encoding different isoforms. ICAM3 ENSG00000076662 NA
baculoviral IAP repeat containing 3 330 This gene encodes a member of the IAP family of proteins that inhibit apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2, probably by interfering with activation of ICE-like proteases. The encoded protein inhibits apoptosis induced by serum deprivation but does not affect apoptosis resulting from exposure to menadione, a potent inducer of free radicals. It contains 3 baculovirus IAP repeats and a ring finger domain. Transcript variants encoding the same isoform have been identified. BIRC3 ENSG00000023445 NA
peripheral myelin protein 22 5376 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. PMP22 ENSG00000109099 NA
NA NA NA NA ENSG00000140181 TRUE
albumin 213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ALB ENSG00000163631 NA
NA NA NA NA ENSG00000259716 TRUE
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
HBB 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. ENSG00000244734 hemoglobin subunit beta NA
GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. ENSG00000135821 glutamate-ammonia ligase NA
GPX3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. ENSG00000211445 glutathione peroxidase 3 NA
HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000188536 hemoglobin subunit alpha 2 NA
PRL 5617 This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. ENSG00000172179 prolactin NA
AHNAK 79026 NA ENSG00000124942 AHNAK nucleoprotein NA
LPL 4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. ENSG00000175445 lipoprotein lipase NA
LTBP2 4053 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. ENSG00000119681 latent transforming growth factor beta binding protein 2 NA
FASN 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. ENSG00000169710 fatty acid synthase NA
VWF 7450 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. ENSG00000110799 von Willebrand factor NA
GH1 2688 The protein encoded by this gene is a member of the somatotropin/prolactin family of hormones which play an important role in growth control. The gene, along with four other related genes, is located at the growth hormone locus on chromosome 17 where they are interspersed in the same transcriptional orientation; an arrangement which is thought to have evolved by a series of gene duplications. The five genes share a remarkably high degree of sequence identity. Alternative splicing generates additional isoforms of each of the five growth hormones, leading to further diversity and potential for specialization. This particular family member is expressed in the pituitary but not in placental tissue as is the case for the other four genes in the growth hormone locus. Mutations in or deletions of the gene lead to growth hormone deficiency and short stature. ENSG00000259384 growth hormone 1 NA
ACACB 32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. ENSG00000076555 acetyl-CoA carboxylase beta NA
NA NA NA ENSG00000117289 NA TRUE
PLIN4 729359 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). ENSG00000167676 perilipin 4 NA
GNAS 2778 This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. ENSG00000087460 GNAS complex locus NA
PIEZO1 9780 The protein encoded by this gene is a mechanically-activated ion channel that links mechanical forces to biological signals. The encoded protein contains 36 transmembrane domains and functions as a homotetramer. Defects in this gene have been associated with dehydrated hereditary stomatocytosis. ENSG00000103335 piezo type mechanosensitive ion channel component 1 NA
LAMB1 3912 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. ENSG00000091136 laminin subunit beta 1 NA
POMC 5443 This gene encodes a preproprotein that undergoes extensive, tissue-specific, post-translational processing via cleavage by subtilisin-like enzymes known as prohormone convertases. There are eight potential cleavage sites within the preproprotein and, depending on tissue type and the available convertases, processing may yield as many as ten biologically active peptides involved in diverse cellular functions. The encoded protein is synthesized mainly in corticotroph cells of the anterior pituitary where four cleavage sites are used; adrenocorticotrophin, essential for normal steroidogenesis and the maintenance of normal adrenal weight, and lipotropin beta are the major end products. In other tissues, including the hypothalamus, placenta, and epithelium, all cleavage sites may be used, giving rise to peptides with roles in pain and energy homeostasis, melanocyte stimulation, and immune modulation. These include several distinct melanotropins, lipotropins, and endorphins that are contained within the adrenocorticotrophin and beta-lipotropin peptides. The antimicrobial melanotropin alpha peptide exhibits antibacterial and antifungal activity. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation. Alternatively spliced transcript variants encoding the same protein have been described. ENSG00000115138 proopiomelanocortin NA
SHANK3 ENSG00000251322 NA ENSG00000251322 SH3 and multiple ankyrin repeat domains 3 NA
CD163 9332 The protein encoded by this gene is a member of the scavenger receptor cysteine-rich (SRCR) superfamily, and is exclusively expressed in monocytes and macrophages. It functions as an acute phase-regulated receptor involved in the clearance and endocytosis of hemoglobin/haptoglobin complexes by macrophages, and may thereby protect tissues from free hemoglobin-mediated oxidative damage. This protein may also function as an innate immune sensor for bacteria and inducer of local inflammation. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000177575 CD163 molecule NA
FABP4 2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. ENSG00000170323 fatty acid binding protein 4 NA
HBA1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000206172 hemoglobin subunit alpha 1 NA
NID1 4811 This gene encodes a member of the nidogen family of basement membrane glycoproteins. The protein interacts with several other components of basement membranes, and may play a role in cell interactions with the extracellular matrix. ENSG00000116962 nidogen 1 NA
TACC1 6867 This locus may represent a breast cancer candidate gene. It is located close to FGFR1 on a region of chromosome 8 that is amplified in some breast cancers. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000147526 transforming acidic coiled-coil containing protein 1 NA
TIMP3 7078 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. ENSG00000100234 TIMP metallopeptidase inhibitor 3 NA
MTND2P28 ENSG00000225630 NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA
IGFBP7 3490 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). ENSG00000163453 insulin like growth factor binding protein 7 NA
COL8A1 1295 This gene encodes one of the two alpha chains of type VIII collagen. The gene product is a short chain collagen and a major component of the basement membrane of the corneal endothelium. The type VIII collagen fibril can be either a homo- or a heterotrimer. Alternatively spliced transcript variants encoding the same protein have been observed. ENSG00000144810 collagen type VIII alpha 1 NA
STAB1 23166 This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. ENSG00000010327 stabilin 1 NA
C1QB 713 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the B-chain polypeptide of human complement subcomponent C1q ENSG00000173369 complement component 1, q subcomponent, B chain NA
P4HB 5034 This gene encodes the beta subunit of prolyl 4-hydroxylase, a highly abundant multifunctional enzyme that belongs to the protein disulfide isomerase family. When present as a tetramer consisting of two alpha and two beta subunits, this enzyme is involved in hydroxylation of prolyl residues in preprocollagen. This enzyme is also a disulfide isomerase containing two thioredoxin domains that catalyze the formation, breakage and rearrangement of disulfide bonds. Other known functions include its ability to act as a chaperone that inhibits aggregation of misfolded proteins in a concentration-dependent manner, its ability to bind thyroid hormone, its role in both the influx and efflux of S-nitrosothiol-bound nitric oxide, and its function as a subunit of the microsomal triglyceride transfer protein complex. ENSG00000185624 prolyl 4-hydroxylase subunit beta NA
EFEMP1 2202 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. ENSG00000115380 EGF containing fibulin like extracellular matrix protein 1 NA
CHGB 1114 This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. ENSG00000089199 chromogranin B NA
MYL9 10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000101335 myosin light chain 9 NA
CD36 948 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. ENSG00000135218 CD36 molecule NA
CD74 972 The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. This protein also interacts with amyloid precursor protein (APP) and suppresses the production of amyloid beta (Abeta). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000019582 CD74 molecule NA
CLDN5 7122 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets. Mutations in this gene have been found in patients with velocardiofacial syndrome. Alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000184113 claudin 5 NA
SPTBN1 6711 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000115306 spectrin beta, non-erythrocytic 1 NA
PIGR 5284 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. ENSG00000162896 polymeric immunoglobulin receptor NA
PLA2G2A 5320 The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. ENSG00000188257 phospholipase A2 group IIA NA
ITM2B 9445 Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. ENSG00000136156 integral membrane protein 2B NA
ACKR1 2532 The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The encoded protein is the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000213088 atypical chemokine receptor 1 (Duffy blood group) NA
C1QC 714 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. A deficiency in C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N-terminus, and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the C-chain polypeptide of human complement subcomponent C1q. Alternatively spliced transcript variants that encode the same protein have been found for this gene. ENSG00000159189 complement component 1, q subcomponent, C chain NA
VCAN 1462 This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000038427 versican NA
FMOD 2331 Fibromodulin belongs to the family of small interstitial proteoglycans. The encoded protein possesses a central region containing leucine-rich repeats with 4 keratan sulfate chains, flanked by terminal domains containing disulphide bonds. Owing to the interaction with type I and type II collagen fibrils and in vitro inhibition of fibrillogenesis, the encoded protein may play a role in the assembly of extracellular matrix. It may also regulate TGF-beta activities by sequestering TGF-beta into the extracellular matrix. Sequence variations in this gene may be associated with the pathogenesis of high myopia. Alternative splicing results in multiple transcript variants. ENSG00000122176 fibromodulin NA
LTBP1 4052 The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000049323 latent transforming growth factor beta binding protein 1 NA
RGS5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. ENSG00000143248 regulator of G-protein signaling 5 NA
MYO1D 4642 NA ENSG00000176658 myosin ID NA
IGHA2 ENSG00000211890 NA ENSG00000211890 immunoglobulin heavy constant alpha 2 (A2m marker) NA
MYO1C 4641 This gene encodes a member of the unconventional myosin protein family, which are actin-based molecular motors. The protein is found in the cytoplasm, and one isoform with a unique N-terminus is also found in the nucleus. The nuclear isoform associates with RNA polymerase I and II and functions in transcription initiation. The mouse ortholog of this protein also functions in intracellular vesicle transport to the plasma membrane. Multiple transcript variants encoding different isoforms have been found for this gene. The related gene myosin IE has been referred to as myosin IC in the literature, but it is a distinct locus on chromosome 19. ENSG00000197879 myosin IC NA
ITPKB 3707 The protein encoded by this protein regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of this encoded protein is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. ENSG00000143772 inositol-trisphosphate 3-kinase B NA
CPE 1363 This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. ENSG00000109472 carboxypeptidase E NA
PLIN1 5346 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. ENSG00000166819 perilipin 1 NA
C1QA 712 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the A-chain polypeptide of human complement subcomponent C1q. ENSG00000173372 complement component 1, q subcomponent, A chain NA
MYH10 4628 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000133026 myosin, heavy chain 10, non-muscle NA
AEBP1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. ENSG00000106624 AE binding protein 1 NA
PMP22 5376 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. ENSG00000109099 peripheral myelin protein 22 NA
LIPE 3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. ENSG00000079435 lipase E, hormone sensitive type NA
CILP 8483 Major alterations in the composition of the cartilage extracellular matrix occur in joint disease, such as osteoarthrosis. This gene encodes the cartilage intermediate layer protein (CILP), which increases in early osteoarthrosis cartilage. The encoded protein was thought to encode a protein precursor for two different proteins; an N-terminal CILP and a C-terminal homolog of NTPPHase, however, later studies identified no nucleotide pyrophosphatase phosphodiesterase (NPP) activity. The full-length and the N-terminal domain of this protein was shown to function as an IGF-1 antagonist. An allelic variant of this gene has been associated with lumbar disc disease. ENSG00000138615 cartilage intermediate layer protein NA
NOV 4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. ENSG00000136999 nephroblastoma overexpressed NA
CSDC2 27254 NA ENSG00000172346 cold shock domain containing C2 NA
MFGE8 4240 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. ENSG00000140545 milk fat globule-EGF factor 8 protein NA
CASKIN2 57513 This gene encodes a large protein that contains six ankyrin repeats, as well as a Src homology 3 (SH3) domain and two sterile alpha motif (SAM) domains, which may be involved in protein-protein interactions. The C-terminal portion of this protein is proline-rich and contains a conserved region. A related protein interacts with calcium/calmodulin-dependent serine protein kinase (CASK). Alternative splicing results in multiple transcript variants. ENSG00000177303 CASK interacting protein 2 NA
ACVRL1 94 This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. ENSG00000139567 activin A receptor like type 1 NA
COL5A3 50509 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are thought to be responsible for the symptoms of a subset of patients with Ehlers-Danlos syndrome type III. Messages of several sizes can be detected in northern blots but sequence information cannot confirm the identity of the shorter messages. ENSG00000080573 collagen type V alpha 3 NA
EGFL7 51162 This gene encodes a secreted endothelial cell protein that contains two epidermal growth factor-like domains. The encoded protein may play a role in regulating vasculogenesis. This protein may be involved in the growth and proliferation of tumor cells. Alternate splicing results in multiple transcript variants. ENSG00000172889 EGF like domain multiple 7 NA
SAMHD1 25939 This gene may play a role in regulation of the innate immune response. The encoded protein is upregulated in response to viral infection and may be involved in mediation of tumor necrosis factor-alpha proinflammatory responses. Mutations in this gene have been associated with Aicardi-Goutieres syndrome. ENSG00000101347 SAM and HD domain containing deoxynucleoside triphosphate triphosphohydrolase 1 NA
ITGA11 22801 This gene encodes an alpha integrin. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein contains an I domain, is expressed in muscle tissue, dimerizes with beta 1 integrin in vitro, and appears to bind collagen in this form. Therefore, the protein may be involved in attaching muscle tissue to the extracellular matrix. Alternative transcriptional splice variants have been found for this gene, but their biological validity is not determined. ENSG00000137809 integrin subunit alpha 11 NA
TSPAN15 23555 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The use of alternate polyadenylation sites has been found for this gene. ENSG00000099282 tetraspanin 15 NA
APOL3 80833 This gene is a member of the apolipoprotein L gene family, and it is present in a cluster with other family members on chromosome 22. The encoded protein is found in the cytoplasm, where it may affect the movement of lipids, including cholesterol, and/or allow the binding of lipids to organelles. In addition, expression of this gene is up-regulated by tumor necrosis factor-alpha in endothelial cells lining the normal and atherosclerotic iliac artery and aorta. Alternative splicing results in multiple transcript variants. ENSG00000128284 apolipoprotein L3 NA
TPSAB1 7177 Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. ENSG00000172236 tryptase alpha/beta 1 NA
AOC3 8639 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. ENSG00000131471 amine oxidase, copper containing 3 NA
SCD 6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. ENSG00000099194 stearoyl-CoA desaturase NA
THBS4 7060 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. ENSG00000113296 thrombospondin 4 NA
NA NA NA ENSG00000259716 NA TRUE
PTPRB 5787 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP contains an extracellular domain, a single transmembrane segment and one intracytoplasmic catalytic domain, thus belongs to receptor type PTP. The extracellular region of this PTP is composed of multiple fibronectin type_III repeats, which was shown to interact with neuronal receptor and cell adhesion molecules, such as contactin and tenascin C. This protein was also found to interact with sodium channels, and thus may regulate sodium channels by altering tyrosine phosphorylation status. The functions of the interaction partners of this protein implicate the roles of this PTP in cell adhesion, neurite growth, and neuronal differentiation. Alternate transcript variants encoding different isoforms have been found for this gene. ENSG00000127329 protein tyrosine phosphatase, receptor type B NA
CIDEC 63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. ENSG00000187288 cell death inducing DFFA like effector c NA
PIK3R1 5295 Phosphatidylinositol 3-kinase phosphorylates the inositol ring of phosphatidylinositol at the 3-prime position. The enzyme comprises a 110 kD catalytic subunit and a regulatory subunit of either 85, 55, or 50 kD. This gene encodes the 85 kD regulatory subunit. Phosphatidylinositol 3-kinase plays an important role in the metabolic actions of insulin, and a mutation in this gene has been associated with insulin resistance. Alternative splicing of this gene results in four transcript variants encoding different isoforms. ENSG00000145675 phosphoinositide-3-kinase regulatory subunit 1 NA
PIK3R3 8503 NA ENSG00000117461 phosphoinositide-3-kinase regulatory subunit 3 NA
VIM 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. ENSG00000026025 vimentin NA
RP5-1142A6.9 ENSG00000260121 NA ENSG00000260121 NA NA
PNMAL1 55228 NA ENSG00000182013 paraneoplastic Ma antigen family-like 1 NA
JUNB 3726 NA ENSG00000171223 JunB proto-oncogene, AP-1 transcription factor subunit NA
HYAL2 8692 This gene encodes a weak acid-active hyaluronidase. The encoded protein is similar in structure to other more active hyaluronidases. Hyaluronidases degrade hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan and fragments of hyaluronan are thought to be involved in cell proliferation, migration and differentiation. Although it was previously thought to be a lysosomal hyaluronidase that is active at a pH below 4, the encoded protein is likely a GPI-anchored cell surface protein. This hyaluronidase serves as a receptor for the oncogenic virus Jaagsiekte sheep retrovirus. The gene is one of several related genes in a region of chromosome 3p21.3 associated with tumor suppression. This gene encodes two alternatively spliced transcript variants which differ only in the 5’ UTR. ENSG00000068001 hyaluronoglucosaminidase 2 NA
S1PR1 1901 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. ENSG00000170989 sphingosine-1-phosphate receptor 1 NA
CPM 1368 The protein encoded by this gene is a membrane-bound arginine/lysine carboxypeptidase. Its expression is associated with monocyte to macrophage differentiation. This encoded protein contains hydrophobic regions at the amino and carboxy termini and has 6 potential asparagine-linked glycosylation sites. The active site residues of carboxypeptidases A and B are conserved in this protein. Three alternatively spliced transcript variants encoding the same protein have been described for this gene. ENSG00000135678 carboxypeptidase M NA
CELF2 10659 Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000048740 CUGBP, Elav-like family member 2 NA
FKBP8 23770 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. Unlike the other members of the family, this encoded protein does not seem to have PPIase/rotamase activity. It may have a role in neurons associated with memory function. ENSG00000105701 FK506 binding protein 8 NA
NA NA NA ENSG00000256545 NA TRUE
TMC6 11322 Epidermodysplasia verruciformis (EV) is an autosomal recessive dermatosis characterized by abnormal susceptibility to human papillomaviruses (HPVs) and a high rate of progression to squamous cell carcinoma on sun-exposed skin. EV is caused by mutations in either of two adjacent genes located on chromosome 17q25.3. Both of these genes encode integral membrane proteins that localize to the endoplasmic reticulum and are predicted to form transmembrane channels. This gene encodes a transmembrane channel-like protein with 10 transmembrane domains and 2 leucine zipper motifs. ENSG00000141524 transmembrane channel like 6 NA
MGLL 11343 This gene encodes a serine hydrolase of the AB hydrolase superfamily that catalyzes the conversion of monoacylglycerides to free fatty acids and glycerol. The encoded protein plays a critical role in several physiological processes including pain and nociperception through hydrolysis of the endocannabinoid 2-arachidonoylglycerol. Expression of this gene may play a role in cancer tumorigenesis and metastasis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000074416 monoglyceride lipase NA
TIE1 7075 This gene encodes a member of the tyrosine protein kinase family. The encoded protein plays a critical role in angiogenesis and blood vessel stability by inhibiting angiopoietin 1 signaling through the endothelial receptor tyrosine kinase Tie2. Ectodomain cleavage of the encoded protein relieves inhibition of Tie2 and is mediated by multiple factors including vascular endothelial growth factor. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000066056 tyrosine kinase with immunoglobulin like and EGF like domains 1 NA
CFD 1675 This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. ENSG00000197766 complement factor D NA
UBB 7314 This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. ENSG00000170315 ubiquitin B NA
PPP1R3C 5507 This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. ENSG00000119938 protein phosphatase 1 regulatory subunit 3C NA
SYNM 23336 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. ENSG00000182253 synemin NA
SDC2 6383 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-2 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-2 expression has been detected in several different tumor types. ENSG00000169439 syndecan 2 NA
CGA 1081 The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000135346 glycoprotein hormones, alpha polypeptide NA
NOTCH1 4851 This gene encodes a member of the NOTCH family of proteins. Members of this Type I transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple different domain types. Notch signaling is an evolutionarily conserved intercellular signaling pathway that regulates interactions between physically adjacent cells through binding of Notch family receptors to their cognate ligands. The encoded preproprotein is proteolytically processed in the trans-Golgi network to generate two polypeptide chains that heterodimerize to form the mature cell-surface receptor. This receptor plays a role in the development of numerous cell and tissue types. Mutations in this gene are associated with aortic valve disease, Adams-Oliver syndrome, T-cell acute lymphoblastic leukemia, chronic lymphocytic leukemia, and head and neck squamous cell carcinoma. ENSG00000148400 notch 1 NA
MTURN 222166 NA ENSG00000180354 maturin, neural progenitor differentiation regulator homolog (Xenopus) NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol summary X_id notfound
nebulin ENSG00000183091 NEB This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. 4703 NA
myosin binding protein C, slow type ENSG00000196091 MYBPC1 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 NA
tropomyosin 1 (alpha) ENSG00000140416 TPM1 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. 7168 NA
actin, alpha, cardiac muscle 1 ENSG00000159251 ACTC1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 NA
troponin T2, cardiac type ENSG00000118194 TNNT2 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 NA
myosin, heavy chain 1, skeletal muscle, adult ENSG00000109061 MYH1 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. 4619 NA
myosin, heavy chain 6, cardiac muscle, alpha ENSG00000197616 MYH6 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 NA
natriuretic peptide A ENSG00000175206 NPPA The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 NA
myosin binding protein C, cardiac ENSG00000134571 MYBPC3 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607 NA
tumor protein, translationally-controlled 1 ENSG00000133112 TPT1 NA 7178 NA
pyruvate dehydrogenase kinase 4 ENSG00000004799 PDK4 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. 5166 NA
ryanodine receptor 1 ENSG00000196218 RYR1 This gene encodes a ryanodine receptor found in skeletal muscle. The encoded protein functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule. Mutations in this gene are associated with malignant hyperthermia susceptibility, central core disease, and minicore myopathy with external ophthalmoplegia. Alternatively spliced transcripts encoding different isoforms have been described. 6261 NA
troponin C2, fast skeletal type ENSG00000101470 TNNC2 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. 7125 NA
myosin light chain 7 ENSG00000106631 MYL7 NA 58498 NA
ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1 ENSG00000196296 ATP2A1 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. 487 NA
myosin, heavy chain 2, skeletal muscle, adult ENSG00000125414 MYH2 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4620 NA
dickkopf WNT signaling pathway inhibitor 3 ENSG00000050165 DKK3 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. 27122 NA
carboxypeptidase A1 ENSG00000091704 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 NA
myosin, heavy chain 11, smooth muscle ENSG00000133392 MYH11 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 NA
NPPA antisense RNA 1 ENSG00000242349 NPPA-AS1 NA ENSG00000242349 NA
myosin binding protein C, fast type ENSG00000086967 MYBPC2 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. 4606 NA
carboxypeptidase B1 ENSG00000153002 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 NA
ankyrin repeat domain 1 ENSG00000148677 ANKRD1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 NA
protease, serine 1 ENSG00000204983 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 NA
bridging integrator 1 ENSG00000136717 BIN1 This gene encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynamin, synaptojanin, endophilin, and clathrin. Isoforms that are expressed in muscle and ubiquitously expressed isoforms localize to the cytoplasm and nucleus and activate a caspase-independent apoptotic process. Studies in mouse suggest that this gene plays an important role in cardiac muscle development. Alternate splicing of the gene results in several transcript variants encoding different isoforms. Aberrant splice variants expressed in tumor cell lines have also been described. 274 NA
phosphorylase, glycogen; brain ENSG00000100994 PYGB The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. 5834 NA
myosin light chain 1 ENSG00000168530 MYL1 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. 4632 NA
fibronectin 1 ENSG00000115414 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 NA
troponin I3, cardiac type ENSG00000129991 TNNI3 Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). 7137 NA
NA ENSG00000273149 RP11-290D2.6 NA ENSG00000273149 NA
cysteine rich protein 2 ENSG00000182809 CRIP2 This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. 1397 NA
nuclear paraspeckle assembly transcript 1 (non-protein coding) ENSG00000245532 NEAT1 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. 283131 NA
pancreatic lipase ENSG00000175535 PNLIP This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 NA
glycoprotein 2 ENSG00000169347 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 NA
Y-box binding protein 3 ENSG00000060138 YBX3 NA 8531 NA
kelch like family member 41 ENSG00000239474 KLHL41 This gene is a member of the kelch-like family. The encoded protein contains a BACK domain, a BTB/POZ domain, and 5 Kelch repeats. This protein is thought to function in skeletal muscle development and maintenance. Mutations in this gene have been associated with nemaline myopathy (NM), a rare congenital muscle disorder. 10324 NA
zinc finger AN1-type containing 5 ENSG00000107372 ZFAND5 NA 7763 NA
hemoglobin subunit beta ENSG00000244734 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 NA
keratin 10 ENSG00000186395 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 NA
myosin light chain, phosphorylatable, fast skeletal muscle ENSG00000180209 MYLPF NA 29895 NA
troponin I2, fast skeletal type ENSG00000130598 TNNI2 This gene encodes a fast-twitch skeletal muscle protein, a member of the troponin I gene family, and a component of the troponin complex including troponin T, troponin C and troponin I subunits. The troponin complex, along with tropomyosin, is responsible for the calcium-dependent regulation of striated muscle contraction. Mouse studies show that this component is also present in vascular smooth muscle and may play a role in regulation of smooth muscle function. In addition to muscle tissues, this protein is found in corneal epithelium, cartilage where it is an inhibitor of angiogenesis to inhibit tumor growth and metastasis, and mammary gland where it functions as a co-activator of estrogen receptor-related receptor alpha. This protein also suppresses tumor growth in human ovarian carcinoma. Mutations in this gene cause myopathy and distal arthrogryposis type 2B. Alternatively spliced transcript variants have been found for this gene. 7136 NA
chymotrypsin like elastase family member 3A ENSG00000142789 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 NA
6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3 ENSG00000170525 PFKFB3 The protein encoded by this gene belongs to a family of bifunctional proteins that are involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate (F2,6BP), and a fructose-2,6-biphosphatase activity that catalyzes the degradation of F2,6BP. This protein is required for cell cycle progression and prevention of apoptosis. It functions as a regulator of cyclin-dependent kinase 1, linking glucose metabolism to cell proliferation and survival in tumor cells. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5209 NA
troponin T3, fast skeletal type ENSG00000130595 TNNT3 The binding of Ca(2+) to the trimeric troponin complex initiates the process of muscle contraction. Increased Ca(2+) concentrations produce a conformational change in the troponin complex that is transmitted to tropomyosin dimers situated along actin filaments. The altered conformation permits increased interaction between a myosin head and an actin filament which, ultimately, produces a muscle contraction. The troponin complex has protein subunits C, I, and T. Subunit C binds Ca(2+) and subunit I binds to actin and inhibits actin-myosin interaction. Subunit T binds the troponin complex to the tropomyosin complex and is also required for Ca(2+)-mediated activation of actomyosin ATPase activity. There are 3 different troponin T genes that encode tissue-specific isoforms of subunit T for fast skeletal-, slow skeletal-, and cardiac-muscle. This gene encodes fast skeletal troponin T protein; also known as troponin T type 3. Alternative splicing results in multiple transcript variants encoding additional distinct troponin T type 3 isoforms. A developmentally regulated switch between fetal/neonatal and adult troponin T type 3 isoforms occurs. Additional splice variants have been described but their biological validity has not been established. Mutations in this gene may cause distal arthrogryposis multiplex congenita type 2B (DA2B). 7140 NA
heat shock protein family B (small) member 7 ENSG00000173641 HSPB7 NA 27129 NA
SH3 and cysteine rich domain 3 ENSG00000185482 STAC3 The protein encoded by this gene is a component of the excitation-contraction coupling machinery of muscles. This protein is a member of the Stac gene family and contains an N-terminal cysteine-rich domain and two SH3 domains. Mutations in this gene are a cause of Native American myopathy. 246329 NA
myosin light chain 4 ENSG00000198336 MYL4 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. 4635 NA
myosin light chain 9 ENSG00000101335 MYL9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 NA
troponin T1, slow skeletal type ENSG00000105048 TNNT1 This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. 7138 NA
calsequestrin 2 ENSG00000118729 CASQ2 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. 845 NA
solute carrier family 7 member 2 ENSG00000003989 SLC7A2 The protein encoded by this gene is a cationic amino acid transporter and a member of the APC (amino acid-polyamine-organocation) family of transporters. The encoded membrane protein is responsible for the cellular uptake of arginine, lysine and ornithine. Three transcript variants encoding different isoforms have been found for this gene. 6542 NA
NA ENSG00000259716 NA NA NA TRUE
RNA binding protein with multiple splicing 2 ENSG00000166831 RBPMS2 NA 348093 NA
carboxyl ester lipase ENSG00000170835 CEL The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. 1056 NA
colipase ENSG00000137392 CLPS The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208 NA
chymotrypsinogen B2 ENSG00000168928 CTRB2 NA 440387 NA
keratin 1 ENSG00000167768 KRT1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 NA
microtubule associated monooxygenase, calponin and LIM domain containing 2 ENSG00000133816 MICAL2 NA 9645 NA
phosphorylase, glycogen, muscle ENSG00000068976 PYGM This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 NA
natriuretic peptide B ENSG00000120937 NPPB This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein’s biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. 4879 NA
KIAA0368 ENSG00000136813 KIAA0368 NA 23392 NA
carbonic anhydrase 3 ENSG00000164879 CA3 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. 761 NA
chloride intracellular channel 4 ENSG00000169504 CLIC4 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 4 (CLIC4) protein, encoded by the CLIC4 gene, is a member of the p64 family; the gene is expressed in many tissues and exhibits a intracellular vesicular pattern in Panc-1 cells (pancreatic cancer cells). 25932 NA
complement component 3 ENSG00000125730 C3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 NA
eukaryotic translation initiation factor 4B ENSG00000063046 EIF4B NA 1975 NA
chymotrypsin like elastase family member 3B ENSG00000219073 CELA3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. 23436 NA
keratin 2 ENSG00000172867 KRT2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 NA
MAP kinase interacting serine/threonine kinase 2 ENSG00000099875 MKNK2 This gene encodes a member of the calcium/calmodulin-dependent protein kinases (CAMK) Ser/Thr protein kinase family, which belongs to the protein kinase superfamily. This protein contains conserved DLG (asp-leu-gly) and ENIL (glu-asn-ile-leu) motifs, and an N-terminal polybasic region which binds importin A and the translation factor scaffold protein eukaryotic initiation factor 4G (eIF4G). This protein is one of the downstream kinases activated by mitogen-activated protein (MAP) kinases. It phosphorylates the eukaryotic initiation factor 4E (eIF4E), thus playing important roles in the initiation of mRNA translation, oncogenic transformation and malignant cell proliferation. In addition to eIF4E, this protein also interacts with von Hippel-Lindau tumor suppressor (VHL), ring-box 1 (Rbx1) and Cullin2 (Cul2), which are all components of the CBC(VHL) ubiquitin ligase E3 complex. Multiple alternatively spliced transcript variants have been found, but the full-length nature and biological activity of only two variants are determined. These two variants encode distinct isoforms which differ in activity and regulation, and in subcellular localization. 2872 NA
nebulette ENSG00000078114 NEBL This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. 10529 NA
poly(A) binding protein cytoplasmic 4 ENSG00000090621 PABPC4 Poly(A)-binding proteins (PABPs) bind to the poly(A) tail present at the 3-prime ends of most eukaryotic mRNAs. PABPC4 or IPABP (inducible PABP) was isolated as an activation-induced T-cell mRNA encoding a protein. Activation of T cells increased PABPC4 mRNA levels in T cells approximately 5-fold. PABPC4 contains 4 RNA-binding domains and proline-rich C terminus. PABPC4 is localized primarily to the cytoplasm. It is suggested that PABPC4 might be necessary for regulation of stability of labile mRNA species in activated T cells. PABPC4 was also identified as an antigen, APP1 (activated-platelet protein-1), expressed on thrombin-activated rabbit platelets. PABPC4 may also be involved in the regulation of protein translation in platelets and megakaryocytes or may participate in the binding or stabilization of polyadenylates in platelet dense granules. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 8761 NA
chymotrypsinogen B1 ENSG00000168925 CTRB1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. 1504 NA
popeye domain containing 2 ENSG00000121577 POPDC2 This gene encodes a member of the POP family of proteins which contain three putative transmembrane domains. This membrane associated protein is predominantly expressed in skeletal and cardiac muscle, and may have an important function in these tissues. 64091 NA
forkhead box O1 ENSG00000150907 FOXO1 This gene belongs to the forkhead family of transcription factors which are characterized by a distinct forkhead domain. The specific function of this gene has not yet been determined; however, it may play a role in myogenic growth and differentiation. Translocation of this gene with PAX3 has been associated with alveolar rhabdomyosarcoma. 2308 NA
actin, beta ENSG00000075624 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 NA
myeloid-associated differentiation marker ENSG00000179820 MYADM NA 91663 NA
solute carrier family 25 member 4 ENSG00000151729 SLC25A4 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. 291 NA
vimentin ENSG00000026025 VIM This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 NA
tenascin C ENSG00000041982 TNC This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. 3371 NA
carboxypeptidase A2 ENSG00000158516 CPA2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. 1358 NA
laminin subunit alpha 5 ENSG00000130702 LAMA5 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). 3911 NA
matrix remodeling associated 7 ENSG00000182534 MXRA7 NA 439921 NA
junctophilin 2 ENSG00000149596 JPH2 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. Alternative splicing has been observed at this locus and two variants encoding distinct isoforms are described. 57158 NA
phospholipase A2 group IB ENSG00000170890 PLA2G1B This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. 5319 NA
DEP domain containing MTOR-interacting protein ENSG00000155792 DEPTOR NA 64798 NA
amylase, alpha 2A (pancreatic) ENSG00000243480 AMY2A This gene encodes a member of the alpha-amylase family of proteins. Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the first step in digestion of dietary starch and glycogen. This gene and several family members are present in a gene cluster on chromosome 1. This gene encodes an amylase isoenzyme produced by the pancreas. 279 NA
protein phosphatase 1 regulatory subunit 27 ENSG00000182676 PPP1R27 NA 116729 NA
KIAA1217 ENSG00000120549 KIAA1217 NA 56243 NA
actin binding LIM protein family member 2 ENSG00000163995 ABLIM2 NA 84448 NA
lipin 1 ENSG00000134324 LPIN1 This gene encodes a magnesium-ion-dependent phosphatidic acid phosphohydrolase enzyme that catalyzes the penultimate step in triglyceride synthesis including the dephosphorylation of phosphatidic acid to yield diacylglycerol. Expression of this gene is required for adipocyte differentiation and it also functions as a nuclear transcriptional coactivator with some peroxisome proliferator-activated receptors to modulate expression of other genes involved in lipid metabolism. Mutations in this gene are associated with metabolic syndrome, type 2 diabetes, and autosomal recessive acute recurrent myoglobinuria (ARARM). This gene is also a candidate for several human lipodystrophy syndromes. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional splice variants have been described but their full-length structures have not been determined. 23175 NA
HECT domain E3 ubiquitin protein ligase 1 ENSG00000092148 HECTD1 NA 25831 NA
aarF domain containing kinase 3 ENSG00000163050 ADCK3 This gene encodes a mitochondrial protein similar to yeast ABC1, which functions in an electron-transferring membrane protein complex in the respiratory chain. It is not related to the family of ABC transporter proteins. Expression of this gene is induced by the tumor suppressor p53 and in response to DNA damage, and inhibiting its expression partially suppresses p53-induced apoptosis. Alternatively spliced transcript variants have been found; however, their full-length nature has not been determined. 56997 NA
fatty acid binding protein 4 ENSG00000170323 FABP4 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. 2167 NA
uncharacterized LOC100507537 ENSG00000240045 LOC100507537 NA 100507537 NA
family with sequence similarity 46 member B ENSG00000158246 FAM46B NA 115572 NA
protein phosphatase 1 regulatory subunit 12C ENSG00000125503 PPP1R12C The gene encodes a subunit of myosin phosphatase. The encoded protein regulates the catalytic activity of protein phosphatase 1 delta and assembly of the actin cytoskeleton. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 54776 NA
protein kinase AMP-activated non-catalytic subunit gamma 2 ENSG00000106617 PRKAG2 AMP-activated protein kinase (AMPK) is a heterotrimeric protein composed of a catalytic alpha subunit, a noncatalytic beta subunit, and a noncatalytic regulatory gamma subunit. Various forms of each of these subunits exist, encoded by different genes. AMPK is an important energy-sensing enzyme that monitors cellular energy status and functions by inactivating key enzymes involved in regulating de novo biosynthesis of fatty acid and cholesterol. This gene is a member of the AMPK gamma subunit family. Mutations in this gene have been associated with Wolff-Parkinson-White syndrome, familial hypertrophic cardiomyopathy, and glycogen storage disease of the heart. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 51422 NA
collagen type VI alpha 2 ENSG00000142173 COL6A2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. 1292 NA
uncoupling protein 3 ENSG00000175564 UCP3 Mitochondrial uncoupling proteins (UCP) are members of the larger family of mitochondrial anion carrier proteins (MACP). UCPs separate oxidative phosphorylation from ATP synthesis with energy dissipated as heat, also referred to as the mitochondrial proton leak. UCPs facilitate the transfer of anions from the inner to the outer mitochondrial membrane and the return transfer of protons from the outer to the inner mitochondrial membrane. They also reduce the mitochondrial membrane potential in mammalian cells. The different UCPs have tissue-specific expression; this gene is primarily expressed in skeletal muscle. This gene’s protein product is postulated to protect mitochondria against lipid-induced oxidative stress. Expression levels of this gene increase when fatty acid supplies to mitochondria exceed their oxidation capacity and the protein enables the export of fatty acids from mitochondria. UCPs contain the three solcar protein domains typically found in MACPs. Two splice variants have been found for this gene. 7352 NA
dehydrogenase/reductase 7 ENSG00000100612 DHRS7 This gene encodes a member of the short-chain dehydrogenases/reductases (SDR) family, which has over 46,000 members. Members in this family are enzymes that metabolize many different compounds, such as steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. 51635 NA
vasodilator-stimulated phosphoprotein ENSG00000125753 VASP Vasodilator-stimulated phosphoprotein (VASP) is a member of the Ena-VASP protein family. Ena-VASP family members contain an EHV1 N-terminal domain that binds proteins containing E/DFPPPPXD/E motifs and targets Ena-VASP proteins to focal adhesions. In the mid-region of the protein, family members have a proline-rich domain that binds SH3 and WW domain-containing proteins. Their C-terminal EVH2 domain mediates tetramerization and binds both G and F actin. VASP is associated with filamentous actin formation and likely plays a widespread role in cell adhesion and motility. VASP may also be involved in the intracellular signaling pathways that regulate integrin-extracellular matrix interactions. VASP is regulated by the cyclic nucleotide-dependent kinases PKA and PKG. 7408 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol summary query name notfound
3043 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. ENSG00000244734 hemoglobin subunit beta NA
1586 CYP17A1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. ENSG00000148795 cytochrome P450 family 17 subfamily A member 1 NA
1584 CYP11B1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. ENSG00000160882 cytochrome P450 family 11 subfamily B member 1 NA
5225 PGC This gene encodes an aspartic proteinase that belongs to the peptidase family A1. The encoded protein is a digestive enzyme that is produced in the stomach and constitutes a major component of the gastric mucosa. This protein is also secreted into the serum. This protein is synthesized as an inactive zymogen that includes a highly basic prosegment. This enzyme is converted into its active mature form at low pH by sequential cleavage of the prosegment that is carried out by the enzyme itself. Polymorphisms in this gene are associated with susceptibility to gastric cancers. Serum levels of this enzyme are used as a biomarker for certain gastric diseases including Helicobacter pylori related gastritis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 1. ENSG00000096088 progastricsin NA
4155 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. ENSG00000197971 myelin basic protein NA
1674 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. ENSG00000175084 desmin NA
3329 HSPD1 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. ENSG00000144381 heat shock protein family D (Hsp60) member 1 NA
3040 HBA2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000188536 hemoglobin subunit alpha 2 NA
7169 TPM2 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000198467 tropomyosin 2 (beta) NA
231 AKR1B1 This gene encodes a member of the aldo/keto reductase superfamily, which consists of more than 40 known enzymes and proteins. This member catalyzes the reduction of a number of aldehydes, including the aldehyde form of glucose, and is thereby implicated in the development of diabetic complications by catalyzing the reduction of glucose to sorbitol. Multiple pseudogenes have been identified for this gene. The nomenclature system used by the HUGO Gene Nomenclature Committee to define human aldo-keto reductase family members is known to differ from that used by the Mouse Genome Informatics database. ENSG00000085662 aldo-keto reductase family 1 member B NA
6770 STAR The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. ENSG00000147465 steroidogenic acute regulatory protein NA
8513 LIPF This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000182333 lipase F, gastric type NA
7296 TXNRD1 This gene encodes a member of the family of pyridine nucleotide oxidoreductases. This protein reduces thioredoxins as well as other substrates, and plays a role in selenium metabolism and protection against oxidative stress. The functional enzyme is thought to be a homodimer which uses FAD as a cofactor. Each subunit contains a selenocysteine (Sec) residue which is required for catalytic activity. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenocysteine-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. Alternative splicing results in several transcript variants encoding the same or different isoforms. ENSG00000198431 thioredoxin reductase 1 NA
NA NA NA ENSG00000090920 NA TRUE
493 ATP2B4 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000058668 ATPase plasma membrane Ca2+ transporting 4 NA
1589 CYP21A2 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and hydroxylates steroids at the 21 position. Its activity is required for the synthesis of steroid hormones including cortisol and aldosterone. Mutations in this gene cause congenital adrenal hyperplasia. A related pseudogene is located near this gene; gene conversion events involving the functional gene and the pseudogene are thought to account for many cases of steroid 21-hydroxylase deficiency. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000231852 cytochrome P450 family 21 subfamily A member 2 NA
2355 FOSL2 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. ENSG00000075426 FOS like 2, AP-1 transcription factor subunit NA
3312 HSPA8 This gene encodes a member of the heat shock protein 70 family, which contains both heat-inducible and constitutively expressed members. This protein belongs to the latter group, which are also referred to as heat-shock cognate proteins. It functions as a chaperone, and binds to nascent polypeptides to facilitate correct folding. It also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000109971 heat shock protein family A (Hsp70) member 8 NA
57153 SLC44A2 NA ENSG00000129353 solute carrier family 44 member 2 NA
10398 MYL9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000101335 myosin light chain 9 NA
3858 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. ENSG00000186395 keratin 10 NA
3320 HSP90AA1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000080824 heat shock protein 90kDa alpha family class A member 1 NA
6774 STAT3 The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein is activated through phosphorylation in response to various cytokines and growth factors including IFNs, EGF, IL5, IL6, HGF, LIF and BMP2. This protein mediates the expression of a variety of genes in response to cell stimuli, and thus plays a key role in many cellular processes such as cell growth and apoptosis. The small GTPase Rac1 has been shown to bind and regulate the activity of this protein. PIAS3 protein is a specific inhibitor of this protein. Mutations in this gene are associated with infantile-onset multisystem autoimmune disease and hyper-immunoglobulin E syndrome. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000168610 signal transducer and activator of transcription 3 NA
1583 CYP11A1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and catalyzes the conversion of cholesterol to pregnenolone, the first and rate-limiting step in the synthesis of the steroid hormones. Two transcript variants encoding different isoforms have been found for this gene. The cellular location of the smaller isoform is unclear since it lacks the mitochondrial-targeting transit peptide. ENSG00000140459 cytochrome P450 family 11 subfamily A member 1 NA
ENSG00000266844 RP11-862L9.3 NA ENSG00000266844 NA NA
211 ALAS1 This gene encodes the mitochondrial enzyme which is catalyzes the rate-limiting step in heme (iron-protoporphyrin) biosynthesis. The enzyme encoded by this gene is the housekeeping enzyme; a separate gene encodes a form of the enzyme that is specific for erythroid tissue. The level of the mature encoded protein is regulated by heme: high levels of heme down-regulate the mature enzyme in mitochondria while low heme levels up-regulate. A pseudogene of this gene is located on chromosome 12. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000023330 5’-aminolevulinate synthase 1 NA
730 C7 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. ENSG00000112936 complement component 7 NA
NA NA NA ENSG00000259716 NA TRUE
3880 KRT19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. ENSG00000171345 keratin 19 NA
10963 STIP1 STIP1 is an adaptor protein that coordinates the functions of HSP70 (see HSPA1A; MIM 140550) and HSP90 (see HSP90AA1; MIM 140571) in protein folding. It is thought to assist in the transfer of proteins from HSP70 to HSP90 by binding both HSP90 and substrate-bound HSP70. STIP1 also stimulates the ATPase activity of HSP70 and inhibits the ATPase activity of HSP90, suggesting that it regulates both the conformations and ATPase cycles of these chaperones (Song and Masison, 2005 [PubMed 16100115]). ENSG00000168439 stress induced phosphoprotein 1 NA
7057 THBS1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1 NA
2597 GAPDH This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase NA
643834 PGA3 This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. ENSG00000229859 pepsinogen 3, group I (pepsinogen A) NA
4629 MYH11 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000133392 myosin, heavy chain 11, smooth muscle NA
3313 HSPA9 This gene encodes a member of the heat shock protein 70 gene family. The encoded protein is primarily localized to the mitochondria but is also found in the endoplasmic reticulum, plasma membrane and cytoplasmic vesicles. This protein is a heat-shock cognate protein. This protein plays a role in cell proliferation, stress response and maintenance of the mitochondria. A pseudogene of this gene is found on chromosome 2. ENSG00000113013 heat shock protein family A (Hsp70) member 9 NA
83543 AIF1L NA ENSG00000126878 allograft inflammatory factor 1 like NA
2670 GFAP This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000131095 glial fibrillary acidic protein NA
11117 EMILIN1 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. ENSG00000138080 elastin microfibril interfacer 1 NA
170954 PPP1R18 Protein phosphatase-1 (PP1; see MIM 176875) interacts with regulatory subunits that target the enzyme to different cellular locations and change its activity toward specific substrates. Phostensin is a regulatory subunit that targets PP1 to F-actin (see MIM 102610) cytoskeleton (Kao et al., 2007 [PubMed 17374523]). ENSG00000146112 protein phosphatase 1 regulatory subunit 18 NA
3326 HSP90AB1 This gene encodes a member of the heat shock protein 90 family; these proteins are involved in signal transduction, protein folding and degradation and morphological evolution. This gene encodes the constitutive form of the cytosolic 90 kDa heat-shock protein and is thought to play a role in gastric apoptosis and inflammation. Alternative splicing results in multiple transcript variants. Pseudogenes have been identified on multiple chromosomes. ENSG00000096384 heat shock protein 90kDa alpha family class B member 1 NA
60 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ENSG00000075624 actin, beta NA
1981 EIF4G1 The protein encoded by this gene is a component of the multi-subunit protein complex EIF4F. This complex facilitates the recruitment of mRNA to the ribosome, which is a rate-limiting step during the initiation phase of protein synthesis. The recognition of the mRNA cap and the ATP-dependent unwinding of 5’-terminal secondary structure is catalyzed by factors in this complex. The subunit encoded by this gene is a large scaffolding protein that contains binding sites for other members of the EIF4F complex. A domain at its N-terminus can also interact with the poly(A)-binding protein, which may mediate the circularization of mRNA during translation. Alternative splicing results in multiple transcript variants, some of which are derived from alternative promoter usage. ENSG00000114867 eukaryotic translation initiation factor 4 gamma 1 NA
11067 C10orf10 The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. ENSG00000165507 chromosome 10 open reading frame 10 NA
6768 ST14 The protein encoded by this gene is an epithelial-derived, integral membrane serine protease. This protease forms a complex with the Kunitz-type serine protease inhibitor, HAI-1, and is found to be activated by sphingosine 1-phosphate. This protease has been shown to cleave and activate hepatocyte growth factor/scattering factor, and urokinase plasminogen activator, which suggest the function of this protease as an epithelial membrane activator for other proteases and latent growth factors. The expression of this protease has been associated with breast, colon, prostate, and ovarian tumors, which implicates its role in cancer invasion, and metastasis. ENSG00000149418 suppression of tumorigenicity 14 NA
1200 TPP1 This gene encodes a member of the sedolisin family of serine proteases. The protease functions in the lysosome to cleave N-terminal tripeptides from substrates, and has weaker endopeptidase activity. It is synthesized as a catalytically-inactive enzyme which is activated and auto-proteolyzed upon acidification. Mutations in this gene result in late-infantile neuronal ceroid lipofuscinosis, which is associated with the failure to degrade specific neuropeptides and a subunit of ATP synthase in the lysosome. ENSG00000166340 tripeptidyl peptidase 1 NA
3039 HBA1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000206172 hemoglobin subunit alpha 1 NA
84440 RAB11FIP4 Proteins of the large Rab GTPase family (see RAB1A; MIM 179508) have regulatory roles in the formation, targeting, and fusion of intracellular transport vesicles. RAB11FIP4 is one of many proteins that interact with and regulate Rab GTPases (Hales et al., 2001 [PubMed 11495908]). ENSG00000131242 RAB11 family interacting protein 4 NA
3849 KRT2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000172867 keratin 2 NA
6515 SLC2A3 NA ENSG00000059804 solute carrier family 2 member 3 NA
5738 PTGFRN NA ENSG00000134247 prostaglandin F2 receptor inhibitor NA
9231 DLG5 This gene encodes a member of the family of discs large (DLG) homologs, a subset of the membrane-associated guanylate kinase (MAGUK) superfamily. The MAGUK proteins are composed of a catalytically inactive guanylate kinase domain, in addition to PDZ and SH3 domains, and are thought to function as scaffolding molecules at sites of cell-cell contact. The protein encoded by this gene localizes to the plasma membrane and cytoplasm, and interacts with components of adherens junctions and the cytoskeleton. It is proposed to function in the transmission of extracellular signals to the cytoskeleton and in the maintenance of epithelial cell structure. Alternative splice variants have been described but their biological nature has not been determined. ENSG00000151208 discs large MAGUK scaffold protein 5 NA
3301 DNAJA1 This gene encodes a member of the DnaJ family of proteins, which act as heat shock protein 70 cochaperones. Heat shock proteins facilitate protein folding, trafficking, prevention of aggregation, and proteolytic degradation. Members of this family are characterized by a highly conserved N-terminal J domain, a glycine/phenylalanine-rich region, four CxxCxGxG zinc finger repeats, and a C-terminal substrate-binding domain. The J domain mediates the interaction with heat shock protein 70 to recruit substrates and regulate ATP hydrolysis activity. In humans, this gene has been implicated in positive regulation of virus replication through co-option by the influenza A virus. Several pseudogenes of this gene are found on other chromosomes. ENSG00000086061 DnaJ heat shock protein family (Hsp40) member A1 NA
9547 CXCL14 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. ENSG00000145824 C-X-C motif chemokine ligand 14 NA
721 C4B This gene encodes the basic form of complement factor 4, part of the classical activation pathway. The protein is expressed as a single chain precursor which is proteolytically cleaved into a trimer of alpha, beta, and gamma chains prior to secretion. The trimer provides a surface for interaction between the antigen-antibody complex and other complement components. The alpha chain may be cleaved to release C4 anaphylatoxin, a mediator of local inflammation. Deficiency of this protein is associated with systemic lupus erythematosus. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. Varying haplotypes of this gene cluster exist, such that individuals may have 1, 2, or 3 copies of this gene. In addition, this gene exists as a long form and a short form due to the presence or absence of a 6.4 kb endogenous HERV-K retrovirus in intron 9. ENSG00000224389 complement component 4B (Chido blood group) NA
3291 HSD11B2 There are at least two isozymes of the corticosteroid 11-beta-dehydrogenase, a microsomal enzyme complex responsible for the interconversion of cortisol and cortisone. The type I isozyme has both 11-beta-dehydrogenase (cortisol to cortisone) and 11-oxoreductase (cortisone to cortisol) activities. The type II isozyme, encoded by this gene, has only 11-beta-dehydrogenase activity. In aldosterone-selective epithelial tissues such as the kidney, the type II isozyme catalyzes the glucocorticoid cortisol to the inactive metabolite cortisone, thus preventing illicit activation of the mineralocorticoid receptor. In tissues that do not express the mineralocorticoid receptor, such as the placenta and testis, it protects cells from the growth-inhibiting and/or pro-apoptotic effects of cortisol, particularly during embryonic development. Mutations in this gene cause the syndrome of apparent mineralocorticoid excess and hypertension. ENSG00000176387 hydroxysteroid 11-beta dehydrogenase 2 NA
1278 COL1A2 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000164692 collagen type I alpha 2 chain NA
2335 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1 NA
84152 PPP1R1B This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000131771 protein phosphatase 1 regulatory inhibitor subunit 1B NA
3091 HIF1A This gene encodes the alpha subunit of transcription factor hypoxia-inducible factor-1 (HIF-1), which is a heterodimer composed of an alpha and a beta subunit. HIF-1 functions as a master regulator of cellular and systemic homeostatic response to hypoxia by activating transcription of many genes, including those involved in energy metabolism, angiogenesis, apoptosis, and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia. HIF-1 thus plays an essential role in embryonic vascularization, tumor angiogenesis and pathophysiology of ischemic disease. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. ENSG00000100644 hypoxia inducible factor 1 alpha subunit NA
9026 HIP1R NA ENSG00000130787 huntingtin interacting protein 1 related NA
8727 CTNNAL1 NA ENSG00000119326 catenin alpha like 1 NA
2192 FBLN1 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. ENSG00000077942 fibulin 1 NA
1634 DCN This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin NA
6662 SOX9 The protein encoded by this gene recognizes the sequence CCTTGAG along with other members of the HMG-box class DNA-binding proteins. It acts during chondrocyte differentiation and, with steroidogenic factor 1, regulates transcription of the anti-Muellerian hormone (AMH) gene. Deficiencies lead to the skeletal malformation syndrome campomelic dysplasia, frequently with sex reversal. ENSG00000125398 SRY-box 9 NA
3949 LDLR The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. Low density lipoprotein (LDL) is normally bound at the cell membrane and taken into the cell ending up in lysosomes where the protein is degraded and the cholesterol is made available for repression of microsomal enzyme 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase, the rate-limiting step in cholesterol synthesis. At the same time, a reciprocal stimulation of cholesterol ester synthesis takes place. Mutations in this gene cause the autosomal dominant disorder, familial hypercholesterolemia. Alternate splicing results in multiple transcript variants. ENSG00000130164 low density lipoprotein receptor NA
10974 ADIRF APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. ENSG00000148671 adipogenesis regulatory factor NA
7538 ZFP36 NA ENSG00000128016 ZFP36 ring finger protein NA
716 C1S This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. ENSG00000182326 complement component 1, s subcomponent NA
25959 KANK2 NA ENSG00000197256 KN motif and ankyrin repeat domains 2 NA
2813 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. ENSG00000169347 glycoprotein 2 NA
4616 GADD45B This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The genes in this group respond to environmental stresses by mediating activation of the p38/JNK pathway. This activation is mediated via their proteins binding and activating MTK1/MEKK4 kinase, which is an upstream activator of both p38 and JNK MAPKs. The function of these genes or their protein products is involved in the regulation of growth and apoptosis. These genes are regulated by different mechanisms, but they are often coordinately expressed and can function cooperatively in inhibiting cell growth. ENSG00000099860 growth arrest and DNA damage inducible beta NA
3925 STMN1 This gene belongs to the stathmin family of genes. It encodes a ubiquitous cytosolic phosphoprotein proposed to function as an intracellular relay integrating regulatory signals of the cellular environment. The encoded protein is involved in the regulation of the microtubule filament system by destabilizing microtubules. It prevents assembly and promotes disassembly of microtubules. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000117632 stathmin 1 NA
7168 TPM1 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. ENSG00000140416 tropomyosin 1 (alpha) NA
8565 YARS Aminoacyl-tRNA synthetases catalyze the aminoacylation of tRNA by their cognate amino acid. Because of their central role in linking amino acids with nucleotide triplets contained in tRNAs, aminoacyl-tRNA synthetases are thought to be among the first proteins that appeared in evolution. Tyrosyl-tRNA synthetase belongs to the class I tRNA synthetase family. Cytokine activities have also been observed for the human tyrosyl-tRNA synthetase, after it is split into two parts, an N-terminal fragment that harbors the catalytic site and a C-terminal fragment found only in the mammalian enzyme. The N-terminal fragment is an interleukin-8-like cytokine, whereas the released C-terminal fragment is an EMAP II-like cytokine. ENSG00000134684 tyrosyl-tRNA synthetase NA
3482 IGF2R This gene encodes a receptor for both insulin-like growth factor 2 and mannose 6-phosphate. The binding sites for each ligand are located on different segments of the protein. This receptor has various functions, including in the intracellular trafficking of lysosomal enzymes, the activation of transforming growth factor beta, and the degradation of insulin-like growth factor 2. Mutation or loss of heterozygosity of this gene has been association with risk of hepatocellular carcinoma. The orthologous mouse gene is imprinted and shows exclusive expression from the maternal allele; however, imprinting of the human gene may be polymorphic, as only a minority of individuals showed biased expression from the maternal allele (PMID:8267611). ENSG00000197081 insulin like growth factor 2 receptor NA
64129 TINAGL1 The protein encoded by this gene is similar in sequence to tubulointerstitial nephritis antigen, a secreted glycoprotein that is recognized by antibodies in some types of immune-related tubulointerstitial nephritis. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000142910 tubulointerstitial nephritis antigen like 1 NA
2230 FDX1 This gene encodes a small iron-sulfur protein that transfers electrons from NADPH through ferredoxin reductase to mitochondrial cytochrome P450, involved in steroid, vitamin D, and bile acid metabolism. Pseudogenes of this functional gene are found on chromosomes 20 and 21. ENSG00000137714 ferredoxin 1 NA
56287 GKN1 The protein encoded by this gene is found to be down-regulated in human gastric cancer tissue as compared to normal gastric mucosa. ENSG00000169605 gastrokine 1 NA
4783 NFIL3 The protein encoded by this gene is a transcriptional regulator that binds as a homodimer to activating transcription factor (ATF) sites in many cellular and viral promoters. The encoded protein represses PER1 and PER2 expression and therefore plays a role in the regulation of circadian rhythm. Three transcript variants encoding the same protein have been found for this gene. ENSG00000165030 nuclear factor, interleukin 3 regulated NA
4921 DDR2 Receptor tyrosine kinases (RTKs) play a key role in the communication of cells with their microenvironment. These molecules are involved in the regulation of cell growth, differentiation, and metabolism. In several cases the biochemical mechanism by which RTKs transduce signals across the membrane has been shown to be ligand induced receptor oligomerization and subsequent intracellular phosphorylation. This autophosphorylation leads to phosphorylation of cytosolic targets as well as association with other molecules, which are involved in pleiotropic effects of signal transduction. RTKs have a tripartite structure with extracellular, transmembrane, and cytoplasmic regions. This gene encodes a member of a novel subclass of RTKs and contains a distinct extracellular region encompassing a factor VIII-like domain. Alternative splicing in the 5’ UTR results in multiple transcript variants encoding the same protein. ENSG00000162733 discoidin domain receptor tyrosine kinase 2 NA
3576 CXCL8 The protein encoded by this gene is a member of the CXC chemokine family. This chemokine is one of the major mediators of the inflammatory response. This chemokine is secreted by several cell types. It functions as a chemoattractant, and is also a potent angiogenic factor. This gene is believed to play a role in the pathogenesis of bronchiolitis, a common respiratory tract disease caused by viral infection. This gene and other ten members of the CXC chemokine gene family form a chemokine gene cluster in a region mapped to chromosome 4q. ENSG00000169429 C-X-C motif chemokine ligand 8 NA
200081 TXLNA NA ENSG00000084652 taxilin alpha NA
720 C4A This gene encodes the acidic form of complement factor 4, part of the classical activation pathway. The protein is expressed as a single chain precursor which is proteolytically cleaved into a trimer of alpha, beta, and gamma chains prior to secretion. The trimer provides a surface for interaction between the antigen-antibody complex and other complement components. The alpha chain is cleaved to release C4 anaphylatoxin, an antimicrobial peptide and a mediator of local inflammation. Deficiency of this protein is associated with systemic lupus erythematosus and type I diabetes mellitus. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. Varying haplotypes of this gene cluster exist, such that individuals may have 1, 2, or 3 copies of this gene. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000244731 complement component 4A (Rodgers blood group) NA
8404 SPARCL1 NA ENSG00000152583 SPARC like 1 NA
6692 SPINT1 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. ENSG00000166145 serine peptidase inhibitor, Kunitz type 1 NA
22904 SBNO2 NA ENSG00000064932 strawberry notch homolog 2 (Drosophila) NA
1026 CDKN1A This gene encodes a potent cyclin-dependent kinase inhibitor. The encoded protein binds to and inhibits the activity of cyclin-cyclin-dependent kinase2 or -cyclin-dependent kinase4 complexes, and thus functions as a regulator of cell cycle progression at G1. The expression of this gene is tightly controlled by the tumor suppressor protein p53, through which this protein mediates the p53-dependent cell cycle G1 phase arrest in response to a variety of stress stimuli. This protein can interact with proliferating cell nuclear antigen, a DNA polymerase accessory factor, and plays a regulatory role in S phase DNA replication and DNA damage repair. This protein was reported to be specifically cleaved by CASP3-like caspases, which thus leads to a dramatic activation of cyclin-dependent kinase2, and may be instrumental in the execution of apoptosis following caspase activation. Mice that lack this gene have the ability to regenerate damaged or missing tissue. Multiple alternatively spliced variants have been found for this gene. ENSG00000124762 cyclin-dependent kinase inhibitor 1A NA
3566 IL4R This gene encodes the alpha chain of the interleukin-4 receptor, a type I transmembrane protein that can bind interleukin 4 and interleukin 13 to regulate IgE production. The encoded protein also can bind interleukin 4 to promote differentiation of Th2 cells. A soluble form of the encoded protein can be produced by proteolysis of the membrane-bound protein, and this soluble form can inhibit IL4-mediated cell proliferation and IL5 upregulation by T-cells. Allelic variations in this gene have been associated with atopy, a condition that can manifest itself as allergic rhinitis, sinusitus, asthma, or eczema. Polymorphisms in this gene are also associated with resistance to human immunodeficiency virus type-1 infection. Alternate splicing results in multiple transcript variants. ENSG00000077238 interleukin 4 receptor NA
9875 URB1 NA ENSG00000142207 URB1 ribosome biogenesis 1 homolog (S. cerevisiae) NA
5968 REG1B This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000172023 regenerating family member 1 beta NA
29842 TFCP2L1 NA ENSG00000115112 transcription factor CP2-like 1 NA
283229 CRACR2B NA ENSG00000177685 calcium release activated channel regulator 2B NA
80210 ARMC9 NA ENSG00000135931 armadillo repeat containing 9 NA
301 ANXA1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ENSG00000135046 annexin A1 NA
3848 KRT1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1 NA
5644 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1 NA
3913 LAMB2 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins, composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively), form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 2. The beta 2 chain contains the 7 structural domains typical of beta chains of laminin, including the short alpha region. However, unlike beta 1 chain, beta 2 has a more restricted tissue distribution. It is enriched in the basement membrane of muscles at the neuromuscular junctions, kidney glomerulus and vascular smooth muscle. Transgenic mice in which the beta 2 chain gene was inactivated by homologous recombination, showed defects in the maturation of neuromuscular junctions and impairment of glomerular filtration. Alternative splicing involving a non consensus 5’ splice site (gc) in the 5’ UTR of this gene has been reported. It was suggested that inefficient splicing of this first intron, which does not change the protein sequence, results in a greater abundance of the unspliced form of the transcript than the spliced form. The full-length nature of the spliced transcript is not known. ENSG00000172037 laminin subunit beta 2 NA
65108 MARCKSL1 This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. ENSG00000175130 MARCKS like 1 NA
1357 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. ENSG00000091704 carboxypeptidase A1 NA
3960 LGALS4 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. ENSG00000171747 galectin 4 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

GTEx 2013 Factor analysis (sparse factors: voom counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013_transpose/voom_gtex/gtex_voom_transpose_lambda.out");
f_out <- read.table("../sfa_outputs/GTEX2013_transpose/voom_gtex/gtex_voom_transpose_F.out");

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;

indices_mat <- SFA.ExtractTopFeatures(lambda_out, top_features = 100, options = "min",mult.annotate = TRUE)


gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

SFA loadings plot

samples_id <- read.table("../sfa_inputs/samples_id.txt");

tissue_labels <- vector("numeric", NROW(samples_id))
tissue_labels <- samples_id[ ,3]

tissue_levels <- unique(tissue_labels);


cumsum_val <- c(1,cumsum(as.numeric(table(tissue_labels))))
cumsum_low <- cumsum_val[1:(length(cumsum_val)-1)]
cumsum_high <- cumsum_val[2:(length(cumsum_val))];
cumsum_mean <- 0.5*(cumsum_low+cumsum_high)

for(k in 1:20){
png(paste0("../sfa_outputs/GTEX2013_transpose/sfa-figures/voom_sparse_fac_loadings/gtex_sfa_loadings_",k,".png"), width=4, height=4, units="in", res=600)
par(mar=c(6,3,1,1))
par(mar=c(10,3,2,2))
barplot(t(f_out)[,k], axisnames=F,space=0,border=NA,
        main=paste0("SFA on gtex expression: loading:", k),
        las=1, cex.axis=0.3,cex.main=0.4,
        ylim=c(min(f_out[k,]),max(f_out[k,])))
axis(1,at=cumsum_mean,unique(tissue_labels),las=2, cex.axis=0.3);
abline(v=cumsum_high)
dev.off()
}

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query notfound
27063 ankyrin repeat domain 1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ANKRD1 ENSG00000148677 NA
26287 ankyrin repeat domain 2 This gene encodes a protein that belongs to the muscle ankyrin repeat protein (MARP) family. A similar gene in rodents is a component of a muscle stress response pathway and plays a role in the stretch-response associated with slow muscle function. Alternative splicing results in multiple transcript variants encoding different isoforms. ANKRD2 ENSG00000165887 NA
58 actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ACTA1 ENSG00000143632 NA
7134 troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. TNNC1 ENSG00000114854 NA
ENSG00000215861 NA NA WI2-1896O14.1 ENSG00000215861 NA
4151 myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. MB ENSG00000198125 NA
4625 myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. MYH7 ENSG00000092054 NA
933 CD22 molecule NA CD22 ENSG00000012124 NA
ENSG00000258444 NA NA CTD-2201G16.1 ENSG00000258444 NA
81704 dedicator of cytokinesis 8 This gene encodes a member of the DOCK180 family of guanine nucleotide exchange factors. Guanine nucleotide exchange factors interact with Rho GTPases and are components of intracellular signaling networks. Mutations in this gene result in the autosomal recessive form of the hyper-IgE syndrome. Alternatively spliced transcript variants encoding different isoforms have been described. DOCK8 ENSG00000107099 NA
4633 myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL2 ENSG00000111245 NA
1339 cytochrome c oxidase subunit 6A2 Cytochrome c oxidase (COX), the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 2 (heart/muscle isoform) of subunit VIa, and polypeptide 2 is present only in striated muscles. Polypeptide 1 (liver isoform) of subunit VIa is encoded by a different gene, and is found in all non-muscle tissues. These two polypeptides share 66% amino acid sequence identity. COX6A2 ENSG00000156885 NA
8048 cysteine and glycine rich protein 3 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. CSRP3 ENSG00000129170 NA
5243 ATP binding cassette subfamily B member 1 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MDR/TAP subfamily. Members of the MDR/TAP subfamily are involved in multidrug resistance. The protein encoded by this gene is an ATP-dependent drug efflux pump for xenobiotic compounds with broad substrate specificity. It is responsible for decreased drug accumulation in multidrug-resistant cells and often mediates the development of resistance to anticancer drugs. This protein also functions as a transporter in the blood-brain barrier. ABCB1 ENSG00000085563 NA
1158 creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. CKM ENSG00000104879 NA
9244 cytokine receptor like factor 1 This gene encodes a member of the cytokine type I receptor family. The protein forms a secreted complex with cardiotrophin-like cytokine factor 1 and acts on cells expressing ciliary neurotrophic factor receptors. The complex can promote survival of neuronal cells. Mutations in this gene result in Crisponi syndrome and cold-induced sweating syndrome. CRLF1 ENSG00000006016 NA
70 actin, alpha, cardiac muscle 1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ACTC1 ENSG00000159251 NA
8557 titin-cap Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. TCAP ENSG00000173991 NA
81621 Kazal type serine peptidase inhibitor domain 1 This gene encodes a secreted member of the insulin growth factor-binding protein (IGFBP) superfamily. The protein contains an insulin growth factor-binding domain in its N-terminal region, a Kazal-type serine protease inhibitor and follistatin-like domain in its central region, and an immunoglobulin-like domain in its C-terminal region. Studies of the mouse ortholog suggest that this protein may function in bone development and bone regeneration. This gene is hypomethylated and over-expressed in high-grade glioma compared to low-grade glioma, and thus the hypomethylated gene may be associated with cell proliferation and the shorter survival of patients with high-grade glioma. It is also one of numerous genes found to be deleted in a novel 5.54 Mb interstitial deletion, which is associated with multiple congenital anomalies. Alternative splicing results in multiple transcript variants. KAZALD1 ENSG00000107821 NA
113622 ADP-ribosylhydrolase like 1 ADP-ribosylation is a reversible posttranslational modification used to regulate protein function. ADP-ribosyltransferases (see ART1; MIM 601625) transfer ADP-ribose from NAD+ to the target protein, and ADP-ribosylhydrolases, such as ADPRHL1, reverse the reaction (Glowacki et al., 2002 [PubMed 12070318]). ADPRHL1 ENSG00000153531 NA
1880 G protein-coupled receptor 183 This gene was identified by the up-regulation of its expression upon Epstein-Barr virus infection of primary B lymphocytes. This gene is predicted to encode a G protein-coupled receptor that is most closely related to the thrombin receptor. Expression of this gene was detected in B-lymphocyte cell lines and lymphoid tissues but not in T-lymphocyte cell lines or peripheral blood T lymphocytes. The function of this gene is unknown. GPR183 ENSG00000169508 NA
25861 whirlin This gene is thought to function in the organization and stabilization of sterocilia elongation and actin cystoskeletal assembly, based on studies of the related mouse gene. Mutations in this gene have been associated with autosomal recessive non-syndromic deafness and Usher Syndrome. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. WHRN ENSG00000095397 NA
9358 integrin subunit beta like 1 This gene encodes a beta integrin-related protein that is a member of the EGF-like protein family. The encoded protein contains integrin-like cysteine-rich repeats. Alternative splicing results in multiple transcript variants. ITGBL1 ENSG00000198542 NA
NA NA NA NA ENSG00000180672 TRUE
684 bone marrow stromal cell antigen 2 Bone marrow stromal cells are involved in the growth and development of B-cells. The specific function of the protein encoded by the bone marrow stromal cell antigen 2 is undetermined; however, this protein may play a role in pre-B-cell growth and in rheumatoid arthritis. BST2 ENSG00000130303 NA
51778 myozenin 2 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. MYOZ2 ENSG00000172399 NA
5774 protein tyrosine phosphatase, non-receptor type 3 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This protein contains a C-terminal PTP domain and an N-terminal domain homologous to the band 4.1 superfamily of cytoskeletal-associated proteins. P97, a cell cycle regulator involved in a variety of membrane related functions, has been shown to be a substrate of this PTP. This PTP was also found to interact with, and be regulated by adaptor protein 14-3-3 beta. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. PTPN3 ENSG00000070159 NA
ENSG00000245694 colorectal neoplasia differentially expressed (non-protein coding) NA CRNDE ENSG00000245694 NA
286 ankyrin 1 Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Multiple isoforms of ankyrin with different affinities for various target proteins are expressed in a tissue-specific, developmentally regulated manner. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. Ankyrin 1, the prototype of this family, was first discovered in the erythrocytes, but since has also been found in brain and muscles. Mutations in erythrocytic ankyrin 1 have been associated in approximately half of all patients with hereditary spherocytosis. Complex patterns of alternative splicing in the regulatory domain, giving rise to different isoforms of ankyrin 1 have been described. Truncated muscle-specific isoforms of ankyrin 1 resulting from usage of an alternate promoter have also been identified. ANK1 ENSG00000029534 NA
4892 nebulin related anchoring protein NA NRAP ENSG00000197893 NA
105370792 uncharacterized LOC105370792 NA LOC105370792 ENSG00000174171 NA
56901 NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 4-like 2 NA NDUFA4L2 ENSG00000185633 NA
654790 Purkinje cell protein 4 like 1 NA PCP4L1 ENSG00000248485 NA
2634 guanylate binding protein 2 This gene belongs to the guanine-binding protein (GBP) family, which includes interferon-induced proteins that can bind to guanine nucleotides (GMP, GDP and GTP). The encoded protein is a GTPase which hydrolyzes GTP, predominantly to GDP. The protein may play a role as a marker of squamous cell carcinomas. GBP2 ENSG00000162645 NA
1612 death associated protein kinase 1 Death-associated protein kinase 1 is a positive mediator of gamma-interferon induced programmed cell death. DAPK1 encodes a structurally unique 160-kD calmodulin dependent serine-threonine kinase that carries 8 ankyrin repeats and 2 putative P-loop consensus sites. It is a tumor suppressor candidate. Alternative splicing results in multiple transcript variants. DAPK1 ENSG00000196730 NA
54795 transient receptor potential cation channel subfamily M member 4 The protein encoded by this gene is a calcium-activated nonselective ion channel that mediates transport of monovalent cations across membranes, thereby depolarizing the membrane. The activity of the encoded protein increases with increasing intracellular calcium concentration, but this channel does not transport calcium. TRPM4 ENSG00000130529 NA
56241 sushi domain containing 2 NA SUSD2 ENSG00000099994 NA
NA NA NA NA ENSG00000269640 TRUE
ENSG00000250654 NA NA RP11-834C11.7 ENSG00000250654 NA
9659 phosphodiesterase 4D interacting protein The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. PDE4DIP ENSG00000178104 NA
ENSG00000250900 NA NA CTC-338M12.6 ENSG00000250900 NA
65997 RAS like family 11 member B RASL11B is a member of the small GTPase protein family with a high degree of similarity to RAS (see HRAS, MIM 190020) proteins. RASL11B ENSG00000128045 NA
132014 interleukin 17 receptor E This gene encodes a transmembrane protein that functions as the receptor for interleukin-17C. The encoded protein signals to downstream components of the mitogen activated protein kinase (MAPK) pathway. Activity of this protein is important in the immune response to bacterial pathogens. Alternatively spliced transcript variants have been described for this gene. IL17RE ENSG00000163701 NA
8828 neuropilin 2 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. NRP2 ENSG00000118257 NA
57159 tripartite motif containing 54 The protein encoded by this gene contains a RING finger motif and is highly similar to the ring finger proteins RNF28/MURF1 and RNF29/MURF2. In vitro studies demonstrated that this protein, RNF28, and RNF29 form heterodimers, which may be important for the regulation of titin kinase and microtubule-dependent signal pathways in striated muscles. Alternatively spliced transcript variants encoding distinct isoforms have been reported. TRIM54 ENSG00000138100 NA
1346 cytochrome c oxidase subunit 7A1 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. COX7A1 ENSG00000161281 NA
7138 troponin T1, slow skeletal type This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. TNNT1 ENSG00000105048 NA
10231 regulator of calcineurin 2 This gene encodes a member of the regulator of calcineurin (RCAN) protein family. These proteins play a role in many physiological processes by binding to the catalytic domain of calcineurin A, inhibiting calcineurin-mediated nuclear translocation of the transcription factor NFATC1. Expression of this gene in skin fibroblasts is upregulated by thyroid hormone, and the encoded protein may also play a role in endothelial cell function and angiogenesis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. RCAN2 ENSG00000172348 NA
81786 tripartite motif containing 7 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1, a B-box type 2, and a coiled-coil region. The protein localizes to both the nucleus and the cytoplasm, and may represent a participant in the initiation of glycogen synthesis. Alternative splicing results in multiple transcript variants. TRIM7 ENSG00000146054 NA
1410 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB ENSG00000109846 NA
5341 pleckstrin NA PLEK ENSG00000115956 NA
9770 Ras association domain family member 2 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. RASSF2 ENSG00000101265 NA
4053 latent transforming growth factor beta binding protein 2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. LTBP2 ENSG00000119681 NA
ENSG00000225217 heat shock protein family A (Hsp70) member 7 NA HSPA7 ENSG00000225217 NA
2170 fatty acid binding protein 3 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. FABP3 ENSG00000121769 NA
157310 phosphatidylethanolamine binding protein 4 The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]). PEBP4 ENSG00000134020 NA
10382 tubulin beta 4A class IVa This gene encodes a member of the beta tubulin family. Beta tubulins are one of two core protein families (alpha and beta tubulins) that heterodimerize and assemble to form microtubules. Mutations in this gene cause hypomyelinating leukodystrophy-6 and autosomal dominant torsion dystonia-4. Alternate splicing results in multiple transcript variants encoding different isoforms. A pseudogene of this gene is found on chromosome X. TUBB4A ENSG00000104833 NA
11245 G protein-coupled receptor 176 Members of the G protein-coupled receptor family, such as GPR176, are cell surface receptors involved in responses to hormones, growth factors, and neurotransmitters (Hata et al., 1995 [PubMed 7893747]). GPR176 ENSG00000166073 NA
27122 dickkopf WNT signaling pathway inhibitor 3 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. DKK3 ENSG00000050165 NA
130827 transmembrane protein 182 NA TMEM182 ENSG00000170417 NA
51330 tumor necrosis factor receptor superfamily member 12A NA TNFRSF12A ENSG00000006327 NA
5662 pleckstrin and Sec7 domain containing This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. PSD ENSG00000059915 NA
3684 integrin subunit alpha M This gene encodes the integrin alpha M chain. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This I-domain containing alpha integrin combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as macrophage receptor 1 (‘Mac-1’), or inactivated-C3b (iC3b) receptor 3 (‘CR3’). The alpha M beta 2 integrin is important in the adherence of neutrophils and monocytes to stimulated endothelium, and also in the phagocytosis of complement coated particles. Multiple transcript variants encoding different isoforms have been found for this gene. ITGAM ENSG00000169896 NA
5996 regulator of G-protein signaling 1 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. RGS1 ENSG00000090104 NA
8787 regulator of G-protein signaling 9 This gene encodes a member of the RGS family of GTPase activating proteins that function in various signaling pathways by accelerating the deactivation of G proteins. This protein is anchored to photoreceptor membranes in retinal cells and deactivates G proteins in the rod and cone phototransduction cascades. Mutations in this gene result in bradyopsia. Multiple transcript variants encoding different isoforms have been found for this gene. RGS9 ENSG00000108370 NA
ENSG00000272463 NA NA RP11-532F6.3 ENSG00000272463 NA
29995 LIM and cysteine rich domains 1 This gene encodes a member of the LIM-domain family of zinc finger proteins. The encoded protein contains an N-terminal cysteine-rich domain and two C-terminal LIM domains. The presence of LIM domains suggests involvement in protein-protein interactions. The protein may act as a co-regulator of transcription along with other transcription factors. Alternate splicing results in multiple transcript variants of this gene. LMCD1 ENSG00000071282 NA
2621 growth arrest specific 6 This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. GAS6 ENSG00000183087 NA
100507002 uncharacterized LOC100507002 NA LOC100507002 ENSG00000263470 NA
4624 myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6 ENSG00000197616 NA
963 CD53 molecule The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. CD53 ENSG00000143119 NA
482 ATPase Na+/K+ transporting subunit beta 2 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. ATP1B2 ENSG00000129244 NA
1160 creatine kinase, mitochondrial 2 Mitochondrial creatine kinase (MtCK) is responsible for the transfer of high energy phosphate from mitochondria to the cytosolic carrier, creatine. It belongs to the creatine kinase isoenzyme family. It exists as two isoenzymes, sarcomeric MtCK and ubiquitous MtCK, encoded by separate genes. Mitochondrial creatine kinase occurs in two different oligomeric forms: dimers and octamers, in contrast to the exclusively dimeric cytosolic creatine kinase isoenzymes. Sarcomeric mitochondrial creatine kinase has 80% homology with the coding exons of ubiquitous mitochondrial creatine kinase. This gene contains sequences homologous to several motifs that are shared among some nuclear genes encoding mitochondrial proteins and thus may be essential for the coordinated activation of these genes during mitochondrial biogenesis. Three transcript variants encoding the same protein have been found for this gene. CKMT2 ENSG00000131730 NA
283248 REST corepressor 2 NA RCOR2 ENSG00000167771 NA
ENSG00000225792 NA NA AC004540.4 ENSG00000225792 NA
7273 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN ENSG00000155657 NA
NA NA NA NA ENSG00000272003 TRUE
ENSG00000254539 NA NA RP4-791M13.3 ENSG00000254539 NA
816 calcium/calmodulin dependent protein kinase II beta The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. CAMK2B ENSG00000058404 NA
51363 carbohydrate (N-acetylgalactosamine 4-sulfate 6-O) sulfotransferase 15 Chondroitin sulfate (CS) is a glycosaminoglycan which is an important structural component of the extracellular matrix and which links to proteins to form proteoglycans. Chondroitin sulfate E (CS-E) is an isomer of chondroitin sulfate in which the C-4 and C-6 hydroxyl groups are sulfated. This gene encodes a type II transmembrane glycoprotein that acts as a sulfotransferase to transfer sulfate to the C-6 hydroxal group of chondroitin sulfate. This gene has also been identified as being co-expressed with RAG1 in B-cells and as potentially acting as a B-cell surface signaling receptor. Alternative splicing results in multiple transcript variants encoding distinct isoforms. CHST15 ENSG00000182022 NA
8497 PTPRF interacting protein alpha 4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PPFIA4 ENSG00000143847 NA
3936 lymphocyte cytosolic protein 1 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. Plastin 1 (otherwise known as Fimbrin) is a third distinct plastin isoform which is specifically expressed at high levels in the small intestine. The L isoform is expressed only in hemopoietic cell lineages, while the T isoform has been found in all other normal cells of solid tissues that have replicative potential (fibroblasts, endothelial cells, epithelial cells, melanocytes, etc.). However, L-plastin has been found in many types of malignant human cells of non-hemopoietic origin suggesting that its expression is induced accompanying tumorigenesis in solid tissues. LCP1 ENSG00000136167 NA
1946 ephrin A5 Ephrin-A5, a member of the ephrin gene family, prevents axon bundling in cocultures of cortical neurons with astrocytes, a model of late stage nervous system development and differentiation. The EPH and EPH-related receptors comprise the largest subfamily of receptor protein-tyrosine kinases and have been implicated in mediating developmental events, particularly in the nervous system. EPH receptors typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin ligands and receptors have been named by the Eph Nomenclature Committee (1997). Based on their structures and sequence relationships, ephrins are divided into the ephrin-A (EFNA) class, which are anchored to the membrane by a glycosylphosphatidylinositol linkage, and the ephrin-B (EFNB) class, which are transmembrane proteins. The Eph family of receptors are similarly divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. EFNA5 ENSG00000184349 NA
2817 glypican 1 Cell surface heparan sulfate proteoglycans are composed of a membrane-associated protein core substituted with a variable number of heparan sulfate chains. Members of the glypican-related integral membrane proteoglycan family (GRIPS) contain a core protein anchored to the cytoplasmic membrane via a glycosyl phosphatidylinositol linkage. These proteins may play a role in the control of cell division and growth regulation. GPC1 ENSG00000063660 NA
404217 cortexin 1 NA CTXN1 ENSG00000178531 NA
53826 FXYD domain containing ion transport regulator 6 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. FXYD6 ENSG00000137726 NA
4015 lysyl oxidase This gene encodes a member of the lysyl oxidase family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate a regulatory propeptide and the mature enzyme. The copper-dependent amine oxidase activity of this enzyme functions in the crosslinking of collagens and elastin, while the propeptide may play a role in tumor suppression. LOX ENSG00000113083 NA
27124 inositol polyphosphate-5-phosphatase J NA INPP5J ENSG00000185133 NA
10536 prolyl 3-hydroxylase 3 The protein encoded by this gene belongs to the leprecan family of proteoglycans, which function as collagen prolyl hydroxylases that are required for proper collagen biosynthesis, folding and assembly. This protein, like other family members, is thought to reside in the endoplasmic reticulum. Epigenetic inactivation of this gene is associated with breast and other cancers, suggesting that it may function as a tumor suppressor. P3H3 ENSG00000110811 NA
1474 cystatin E/M The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. CST6 ENSG00000175315 NA
5336 phospholipase C gamma 2 The protein encoded by this gene is a transmembrane signaling enzyme that catalyzes the conversion of 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate to 1D-myo-inositol 1,4,5-trisphosphate (IP3) and diacylglycerol (DAG) using calcium as a cofactor. IP3 and DAG are second messenger molecules important for transmitting signals from growth factor receptors and immune system receptors across the cell membrane. Mutations in this gene have been found in autoinflammation, antibody deficiency, and immune dysregulation syndrome and familial cold autoinflammatory syndrome 3. PLCG2 ENSG00000197943 NA
90993 cAMP responsive element binding protein 3 like 1 The protein encoded by this gene is normally found in the membrane of the endoplasmic reticulum (ER). However, upon stress to the ER, the encoded protein is cleaved and the released cytoplasmic transcription factor domain translocates to the nucleus. There it activates the transcription of target genes by binding to box-B elements. CREB3L1 ENSG00000157613 NA
ENSG00000200278 RNA, 5S ribosomal pseudogene 352 NA RNA5SP352 ENSG00000200278 NA
1397 cysteine rich protein 2 This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. CRIP2 ENSG00000182809 NA
NA NA NA NA ENSG00000203691 TRUE
51088 kelch like family member 5 NA KLHL5 ENSG00000109790 NA
4634 myosin light chain 3 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL3 ENSG00000160808 NA
54751 filamin binding LIM protein 1 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. FBLIM1 ENSG00000162458 NA
1294 collagen type VII alpha 1 This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. COL7A1 ENSG00000114270 NA
4689 neutrophil cytosolic factor 4 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. NCF4 ENSG00000100365 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
MME 4311 ENSG00000196549 membrane metallo-endopeptidase This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. NA
NRCAM 4897 ENSG00000091129 neuronal cell adhesion molecule Cell adhesion molecules (CAMs) are members of the immunoglobulin superfamily. This gene encodes a neuronal cell adhesion molecule with multiple immunoglobulin-like C2-type domains and fibronectin type-III domains. This ankyrin-binding protein is involved in neuron-neuron adhesion and promotes directional signaling during axonal cone growth. This gene is also expressed in non-neural tissues and may play a general role in cell-cell communication via signaling from its intracellular domain to the actin cytoskeleton during directional cell migration. Allelic variants of this gene have been associated with autism and addiction vulnerability. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
GSTA1 2938 ENSG00000243955 glutathione S-transferase alpha 1 This gene encodes a member of a family of enzymes that function to add glutathione to target electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins, and products of oxidative stress. This action is an important step in detoxification of these compounds. This subfamily of enzymes has a particular role in protecting cells from reactive oxygen species and the products of peroxidation. Polymorphisms in this gene influence the ability of individuals to metabolize different drugs. This gene is located in a cluster of similar genes and pseudogenes on chromosome 6. Alternative splicing results in multiple transcript variants. NA
FGA 2243 ENSG00000171560 fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. NA
AMBP 259 ENSG00000106927 alpha-1-microglobulin/bikunin precursor This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. NA
FGB 2244 ENSG00000171564 fibrinogen beta chain The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ANO1-AS1 ENSG00000254902 ENSG00000254902 ANO1 antisense RNA 1 NA NA
GADD45G 10912 ENSG00000130222 growth arrest and DNA damage inducible gamma This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The protein encoded by this gene responds to environmental stresses by mediating activation of the p38/JNK pathway via MTK1/MEKK4 kinase. The GADD45G is highly expressed in placenta. NA
SORBS2 8470 ENSG00000154556 sorbin and SH3 domain containing 2 Arg and c-Abl represent the mammalian members of the Abelson family of non-receptor protein-tyrosine kinases. They interact with the Arg/Abl binding proteins via the SH3 domains present in the carboxy end of the latter group of proteins. This gene encodes the sorbin and SH3 domain containing 2 protein. It has three C-terminal SH3 domains and an N-terminal sorbin homology (SoHo) domain that interacts with lipid raft proteins. The subcellular localization of this protein in epithelial and cardiac muscle cells suggests that it functions as an adapter protein to assemble signaling complexes in stress fibers, and that it is a potential link between Abl family kinases and the actin cytoskeleton. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ALB 213 ENSG00000163631 albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. NA
COLEC11 78989 ENSG00000118004 collectin subfamily member 11 This gene encodes a member of the collectin family of C-type lectins that possess collagen-like sequences and carbohydrate recognition domains. Collectins are secreted proteins that play important roles in the innate immune system by binding to carbohydrate antigens on microorganisms, facilitating their recognition and removal. The encoded protein binds to multiple sugars with a preference for fucose and mannose. Mutations in this gene are a cause of 3MC syndrome-2. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
CADM3-AS1 ENSG00000225670 ENSG00000225670 CADM3 antisense RNA 1 NA NA
VTN 7448 ENSG00000109072 vitronectin The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. NA
APOH 350 ENSG00000091583 apolipoprotein H Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. NA
HAAO 23498 ENSG00000162882 3-hydroxyanthranilate 3,4-dioxygenase 3-Hydroxyanthranilate 3,4-dioxygenase is a monomeric cytosolic protein belonging to the family of intramolecular dioxygenases containing nonheme ferrous iron. It is widely distributed in peripheral organs, such as liver and kidney, and is also present in low amounts in the central nervous system. HAAO catalyzes the synthesis of quinolinic acid (QUIN) from 3-hydroxyanthranilic acid. QUIN is an excitotoxin whose toxicity is mediated by its ability to activate glutamate N-methyl-D-aspartate receptors. Increased cerebral levels of QUIN may participate in the pathogenesis of neurologic and inflammatory disorders. HAAO has been suggested to play a role in disorders associated with altered tissue levels of QUIN. NA
ITIH4-AS1 100873993 ENSG00000239799 ITIH4 antisense RNA 1 NA NA
CADM3 57863 ENSG00000162706 cell adhesion molecule 3 IGSF4B is a brain-specific protein related to the calcium-independent cell-cell adhesion molecules known as nectins (see PVRL3; MIM 607147) (Kakunaga et al., 2005 [PubMed 15741237]). NA
CYP2E1 1571 ENSG00000130649 cytochrome P450 family 2 subfamily E member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is induced by ethanol, the diabetic state, and starvation. The enzyme metabolizes both endogenous substrates, such as ethanol, acetone, and acetal, as well as exogenous substrates including benzene, carbon tetrachloride, ethylene glycol, and nitrosamines which are premutagens found in cigarette smoke. Due to its many substrates, this enzyme may be involved in such varied processes as gluconeogenesis, hepatic cirrhosis, diabetes, and cancer. NA
CHST2 9435 ENSG00000175040 carbohydrate sulfotransferase 2 This locus encodes a sulfotransferase protein. The encoded enzyme catalyzes the sulfation of a nonreducing N-acetylglucosamine residue, and may play a role in biosynthesis of 6-sulfosialyl Lewis X antigen. NA
PPP1R3C 5507 ENSG00000119938 protein phosphatase 1 regulatory subunit 3C This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. NA
FGG 2266 ENSG00000171557 fibrinogen gamma chain The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. NA
APOA1 335 ENSG00000118137 apolipoprotein A1 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. NA
COLEC12 81035 ENSG00000158270 collectin subfamily member 12 This gene encodes a member of the C-lectin family, proteins that possess collagen-like sequences and carbohydrate recognition domains. This protein is a scavenger receptor, a cell surface glycoprotein that displays several functions associated with host defense. It can bind to carbohydrate antigens on microorganisms, facilitating their recognition and removal. It also mediates the recognition, internalization, and degradation of oxidatively modified low density lipoprotein by vascular endothelial cells. NA
RP1-244F24.1 ENSG00000271857 ENSG00000271857 NA NA NA
TMEM56 148534 ENSG00000152078 transmembrane protein 56 NA NA
BEST1 7439 ENSG00000167995 bestrophin 1 This gene encodes a member of the bestrophin gene family. This small gene family is characterized by proteins with a highly conserved N-terminus with four to six transmembrane domains. Bestrophins may form chloride ion channels or may regulate voltage-gated L-type calcium-ion channels. Bestrophins are generally believed to form calcium-activated chloride-ion channels in epithelial cells but they have also been shown to be highly permeable to bicarbonate ion transport in retinal tissue. Mutations in this gene are responsible for juvenile-onset vitelliform macular dystrophy (VMD2), also known as Best macular dystrophy, in addition to adult-onset vitelliform macular dystrophy (AVMD) and other retinopathies. Alternative splicing results in multiple variants encoding distinct isoforms. NA
DUX4L50 ENSG00000232815 ENSG00000232815 double homeobox 4 like 50, pseudogene NA NA
STARD10 10809 ENSG00000214530 StAR related lipid transfer domain containing 10 NA NA
TSPAN5 10098 ENSG00000168785 tetraspanin 5 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. NA
APOC3 345 ENSG00000110245 apolipoprotein C3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. NA
ALDOB 229 ENSG00000136872 aldolase, fructose-bisphosphate B Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. NA
HPD 3242 ENSG00000158104 4-hydroxyphenylpyruvate dioxygenase The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. NA
APOA1-AS 104326055 ENSG00000235910 APOA1 antisense RNA NA NA
ASPHD2 57168 ENSG00000128203 aspartate beta-hydroxylase domain containing 2 NA NA
ITIH3 3699 ENSG00000162267 inter-alpha-trypsin inhibitor heavy chain 3 This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. NA
PBX4 80714 ENSG00000105717 PBX homeobox 4 This gene encodes a member of the pre-B cell leukemia transcription factor family. These proteins are homeobox proteins that play critical roles in embryonic development and cellular differentiation both as Hox cofactors and through Hox-independent pathways. The encoded protein contains a homeobox DNA-binding domain, but specific functions of the protein have not been determined. Alternatively spliced transcript variants have been observed for this gene. NA
TMEM119 338773 ENSG00000183160 transmembrane protein 119 NA NA
PMP22 5376 ENSG00000109099 peripheral myelin protein 22 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. NA
AGT 183 ENSG00000135744 angiotensinogen The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. NA
CYP2C8 1558 ENSG00000138115 cytochrome P450 family 2 subfamily C member 8 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and its expression is induced by phenobarbital. The enzyme is known to metabolize many xenobiotics, including the anticonvulsive drug mephenytoin, benzo(a)pyrene, 7-ethyoxycoumarin, and the anti-cancer drug taxol. This gene is located within a cluster of cytochrome P450 genes on chromosome 10q24. Several transcript variants encoding a few different isoforms have been found for this gene. NA
RP11-862L9.3 ENSG00000266844 ENSG00000266844 NA NA NA
COTL1 23406 ENSG00000103187 coactosin like F-actin binding protein 1 This gene encodes one of the numerous actin-binding proteins which regulate the actin cytoskeleton. This protein binds F-actin, and also interacts with 5-lipoxygenase, which is the first committed enzyme in leukotriene biosynthesis. Although this gene has been reported to map to chromosome 17 in the Smith-Magenis syndrome region, the best alignments for this gene are to chromosome 16. The Smith-Magenis syndrome region is the site of two related pseudogenes. NA
THY1 7070 ENSG00000154096 Thy-1 cell surface antigen This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. NA
OMG 4974 ENSG00000126861 oligodendrocyte myelin glycoprotein NA NA
CCDC188 388849 ENSG00000234409 coiled-coil domain containing 188 NA NA
CPM 1368 ENSG00000135678 carboxypeptidase M The protein encoded by this gene is a membrane-bound arginine/lysine carboxypeptidase. Its expression is associated with monocyte to macrophage differentiation. This encoded protein contains hydrophobic regions at the amino and carboxy termini and has 6 potential asparagine-linked glycosylation sites. The active site residues of carboxypeptidases A and B are conserved in this protein. Three alternatively spliced transcript variants encoding the same protein have been described for this gene. NA
GPRC5C 55890 ENSG00000170412 G protein-coupled receptor class C group 5 member C The protein encoded by this gene is a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The specific function of this protein is unknown; however, this protein may mediate the cellular effects of retinoic acid on the G protein signal transduction cascade. Two transcript variants encoding different isoforms have been found for this gene. NA
HSPA4L 22824 ENSG00000164070 heat shock protein family A (Hsp70) member 4 like The protein encoded by this gene is heat shock inducible and may act as a chaperone. The encoded protein can protect the heat-shocked cell against the harmful effects of aggregated proteins. This gene is highly expressed in leukemia cells and may be a good target for therapeutic intervention. Several transcripts encoding different isoforms have been found for this gene. NA
TSPAN12 23554 ENSG00000106025 tetraspanin 12 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. NA
CGNL1 84952 ENSG00000128849 cingulin-like 1 This gene encodes a member of the cingulin family. The encoded protein localizes to both adherens and tight cell-cell junctions and mediates junction assembly and maintenance by regulating the activity of the small GTPases RhoA and Rac1. Heterozygous chromosomal rearrangements resulting in association of the promoter for this gene with the aromatase gene are a cause of aromatase excess syndrome. Alternatively spliced transcript variants have been observed for this gene. NA
RP11-334E6.12 ENSG00000263873 ENSG00000263873 NA NA NA
HOGA1 112817 ENSG00000241935 4-hydroxy-2-oxoglutarate aldolase 1 The authors of PMID:20797690 cloned this gene while searching for genes in a region of chromosome 10 linked to primary hyperoxalurea type III. They noted that even though the encoded protein has been described as a mitochondrial dihydrodipicolinate synthase-like enzyme, it shares little homology with E. coli dihydrodipicolinate synthase (Dhdps), particularly in the putative substrate-binding region. Moreover, neither lysine biosynthesis nor sialic acid metabolism, for which Dhdps is responsible, occurs in vertebrate mitochondria. They propose that this gene encodes mitochondrial 4-hydroxyl-2-oxoglutarate aldolase (EC 4.1.3.16), which catalyzes the final step in the metabolic pathway of hydroxyproline, releasing glyoxylate and pyruvate. This gene is predominantly expressed in the liver and kidney, and mutations in this gene are found in patients with primary hyperoxalurea type III. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. NA
RHOD 29984 ENSG00000173156 ras homolog family member D Ras homolog, or Rho, proteins interact with protein kinases and may serve as targets for activated GTPase. They play a critical role in muscle differentiation. The protein encoded by this gene binds GTP and is a member of the small GTPase superfamily. It is involved in endosome dynamics and reorganization of the actin cytoskeleton, and it may coordinate membrane transport with the function of the cytoskeleton. Two transcript variants encoding different isoforms have been found for this gene. NA
NA NA ENSG00000255824 NA NA TRUE
APOA2 336 ENSG00000158874 apolipoprotein A2 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. NA
CTHRC1 115908 ENSG00000164932 collagen triple helix repeat containing 1 This locus encodes a protein that may play a role in the cellular response to arterial injury through involvement in vascular remodeling. Mutations at this locus have been associated with Barrett esophagus and esophageal adenocarcinoma. Alternatively spliced transcript variants have been described. NA
PTGES 9536 ENSG00000148344 prostaglandin E synthase The protein encoded by this gene is a glutathione-dependent prostaglandin E synthase. The expression of this gene has been shown to be induced by proinflammatory cytokine interleukin 1 beta (IL1B). Its expression can also be induced by tumor suppressor protein TP53, and may be involved in TP53 induced apoptosis. Knockout studies in mice suggest that this gene may contribute to the pathogenesis of collagen-induced arthritis and mediate acute pain during inflammatory responses. NA
IL1RN 3557 ENSG00000136689 interleukin 1 receptor antagonist The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. NA
SENCR 100507392 ENSG00000254703 smooth muscle and endothelial cell enriched migration/differentiation-associated long non-coding RNA NA NA
KCNG2 26251 ENSG00000178342 potassium voltage-gated channel modifier subfamily G member 2 Voltage-gated potassium (Kv) channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. This gene encodes a member of the potassium channel, voltage-gated, subfamily G. This member is a gamma subunit of the voltage-gated potassium channel. The delayed-rectifier type channels containing this subunit may contribute to cardiac action potential repolarization. NA
CRP 1401 ENSG00000132693 C-reactive protein, pentraxin-related The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. NA
ASPHD1 253982 ENSG00000174939 aspartate beta-hydroxylase domain containing 1 NA NA
EPHX2 2053 ENSG00000120915 epoxide hydrolase 2 This gene encodes a member of the epoxide hydrolase family. The protein, found in both the cytosol and peroxisomes, binds to specific epoxides and converts them to the corresponding dihydrodiols. Mutations in this gene have been associated with familial hypercholesterolemia. Alternatively spliced transcript variants have been described. NA
RP11-356B19.11 ENSG00000271833 ENSG00000271833 NA NA NA
IQGAP2 10788 ENSG00000145703 IQ motif containing GTPase activating protein 2 This gene encodes a member of the IQGAP family. The protein contains three IQ domains, one calponin homology domain, one Ras-GAP domain and one WW domain. It interacts with components of the cytoskeleton, with cell adhesion molecules, and with several signaling molecules to regulate cell morphology and motility. NA
HPDL 84842 ENSG00000186603 4-hydroxyphenylpyruvate dioxygenase like NA NA
ACE 1636 ENSG00000159640 angiotensin I converting enzyme This gene encodes an enzyme involved in catalyzing the conversion of angiotensin I into a physiologically active peptide angiotensin II. Angiotensin II is a potent vasopressor and aldosterone-stimulating peptide that controls blood pressure and fluid-electrolyte balance. This enzyme plays a key role in the renin-angiotensin system. Many studies have associated the presence or absence of a 287 bp Alu repeat element in this gene with the levels of circulating enzyme or cardiovascular pathophysiologies. Multiple alternatively spliced transcript variants encoding different isoforms have been identified, and two most abundant spliced variants encode the somatic form and the testicular form, respectively, that are equally active. NA
APOD 347 ENSG00000189058 apolipoprotein D This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. NA
GLS2 27165 ENSG00000135423 glutaminase 2 The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms. NA
KRT8 3856 ENSG00000170421 keratin 8 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. NA
PKIA 5569 ENSG00000171033 protein kinase (cAMP-dependent, catalytic) inhibitor alpha The protein encoded by this gene is a member of the cAMP-dependent protein kinase (PKA) inhibitor family. This protein was demonstrated to interact with and inhibit the activities of both C alpha and C beta catalytic subunits of the PKA. Alternatively spliced transcript variants encoding the same protein have been reported. NA
FAR2 55711 ENSG00000064763 fatty acyl-CoA reductase 2 This gene belongs to the short chain dehydrogenase/reductase superfamily. It encodes a reductase enzyme involved in the first step of wax biosynthesis wherein fatty acids are converted to fatty alcohols. The encoded peroxisomal protein utilizes saturated fatty acids of 16 or 18 carbons as preferred substrates. Alternatively spliced transcript variants have been observed for this gene. Related pseudogenes have been identified on chromosomes 2, 14 and 22. NA
INMT 11185 ENSG00000241644 indolethylamine N-methyltransferase N-methylation of endogenous and xenobiotic compounds is a major method by which they are degraded. This gene encodes an enzyme that N-methylates indoles such as tryptamine. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the downstream FAM188B (family with sequence similarity 188, member B) gene. NA
FAM171B 165215 ENSG00000144369 family with sequence similarity 171 member B NA NA
CHN1 1123 ENSG00000128656 chimerin 1 This gene encodes GTPase-activating protein for ras-related p21-rac and a phorbol ester receptor. It is predominantly expressed in neurons, and plays an important role in neuronal signal-transduction mechanisms. Mutations in this gene are associated with Duane’s retraction syndrome 2 (DURS2). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
CDC42EP2 10435 ENSG00000149798 CDC42 effector protein 2 CDC42, a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to, and negatively regulate the function of CDC42. Coexpression of this protein with CDC42 suggested a role of this protein in actin filament assembly and cell shape control. NA
GAREM2 150946 ENSG00000157833 GRB2 associated regulator of MAPK1 subtype 2 NA NA
MYO5A 4644 ENSG00000197535 myosin VA This gene is one of three myosin V heavy-chain genes, belonging to the myosin gene superfamily. Myosin V is a class of actin-based motor proteins involved in cytoplasmic vesicle transport and anchorage, spindle-pole alignment and mRNA translocation. The protein encoded by this gene is abundant in melanocytes and nerve cells. Mutations in this gene cause Griscelli syndrome type-1 (GS1), Griscelli syndrome type-3 (GS3) and neuroectodermal melanolysosomal disease, or Elejalde disease. Multiple alternatively spliced transcript variants encoding different isoforms have been reported, but the full-length nature of some variants has not been determined. NA
MCTP1 79772 ENSG00000175471 multiple C2 and transmembrane domain containing 1 NA NA
PLEKHG4 25894 ENSG00000196155 pleckstrin homology and RhoGEF domain containing G4 The protein encoded by this gene can function as a guanine nucleotide exchange factor (GEF) and may play a role in intracellular signaling and cytoskeleton dynamics at the Golgi apparatus. Polymorphisms in the region of this gene have been found to be associated with spinocerebellar ataxia in some study populations. Alternative splicing results in multiple transcript variants. NA
LAMA5-AS1 101928158 ENSG00000228812 LAMA5 antisense RNA 1 NA NA
FYN 2534 ENSG00000010810 FYN proto-oncogene, Src family tyrosine kinase This gene is a member of the protein-tyrosine kinase oncogene family. It encodes a membrane-associated tyrosine kinase that has been implicated in the control of cell growth. The protein associates with the p85 subunit of phosphatidylinositol 3-kinase and interacts with the fyn-binding protein. Alternatively spliced transcript variants encoding distinct isoforms exist. NA
RASA3 22821 ENSG00000185989 RAS p21 protein activator 3 This gene encodes a protein that binds inositol 1,3,4,5-tetrakisphosphate and stimulates the GTPase activity of Ras p21. This protein functions as a negative regulator of the Ras signalling pathway. It is localized to the cell membrane via a pleckstrin homology (PH) domain in the C-terminal region. Alternative splicing results in multiple transcript variants. NA
ADCY3 109 ENSG00000138031 adenylate cyclase 3 This gene encodes adenylyl cyclase 3 which is a membrane-associated enzyme and catalyzes the formation of the secondary messenger cyclic adenosine monophosphate (cAMP). This protein appears to be widely expressed in various human tissues and may be involved in a number of physiological and pathophysiological metabolic processes. Two transcript variants encoding different isoforms have been found for this gene. NA
LIMK1 3984 ENSG00000106683 LIM domain kinase 1 There are approximately 40 known eukaryotic LIM proteins, so named for the LIM domains they contain. LIM domains are highly conserved cysteine-rich structures containing 2 zinc fingers. Although zinc fingers usually function by binding to DNA or RNA, the LIM motif probably mediates protein-protein interactions. LIM kinase-1 and LIM kinase-2 belong to a small subfamily with a unique combination of 2 N-terminal LIM motifs and a C-terminal protein kinase domain. LIMK1 is a serine/threonine kinase that regulates actin polymerization via phosphorylation and inactivation of the actin binding factor cofilin. This protein is ubiquitously expressed during development and plays a role in many cellular processes associated with cytoskeletal structure. This protein also stimulates axon growth and may play a role in brain development. LIMK1 hemizygosity is implicated in the impaired visuospatial constructive cognition of Williams syndrome. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
SLC7A6 9057 ENSG00000103064 solute carrier family 7 member 6 NA NA
BCO2 83875 ENSG00000197580 beta-carotene oxygenase 2 This gene encodes an enzyme which oxidizes carotenoids such as beta-carotene during the biosynthesis of vitamin A. Multiple transcript variants encoding different isoforms have been found for this gene. NA
PLA2G2A 5320 ENSG00000188257 phospholipase A2 group IIA The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. NA
ZNF469 84627 ENSG00000225614 zinc finger protein 469 This gene encodes a zinc-finger protein. Low-percent homology to certain collagens suggests that it may function as a transcription factor or extra-nuclear regulator factor for the synthesis or organization of collagen fibers. Mutations in this gene cause brittle cornea syndrome. NA
RNF157 114804 ENSG00000141576 ring finger protein 157 NA NA
NA NA ENSG00000272016 NA NA TRUE
SSH2 85464 ENSG00000141298 slingshot protein phosphatase 2 This gene encodes a protein tyrosine phosphatase that plays a key role in the regulation of actin filaments. The encoded protein dephosphorylates and activates cofilin, which promotes actin filament depolymerization. Alternative splicing results in multiple transcript variants. NA
KCNAB3 9196 ENSG00000170049 potassium voltage-gated channel subfamily A regulatory beta subunit 3 This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. The encoded protein is one of the beta subunits, which are auxiliary proteins associating with functional Kv-alpha subunits. The encoded protein forms a heterodimer with the potassium voltage-gated channel, shaker-related subfamily, member 5 gene product and regulates the activity of the alpha subunit. NA
SLC39A5 283375 ENSG00000139540 solute carrier family 39 member 5 The protein encoded by this gene belongs to the ZIP family of zinc transporters that transport zinc into cells from outside, and play a crucial role in controlling intracellular zinc levels. Zinc is an essential cofactor for many enzymes and proteins involved in gene transcription, growth, development and differentiation. Mutations in this gene have been associated with autosomal dominant high myopia (MYP24). Alternatively spliced transcript variants have been found for this gene. NA
C9orf3 84909 ENSG00000148120 chromosome 9 open reading frame 3 This gene encodes a member of the M1 zinc aminopeptidase family. The encoded protein is a zinc-dependent metallopeptidase that catalyzes the removal of an amino acid from the amino terminus of a protein or peptide. This protein may play a role in the generation of angiotensin IV. Alternate splicing results in multiple transcript variants. NA
GATM 2628 ENSG00000171766 glycine amidinotransferase This gene encodes a mitochondrial enzyme that belongs to the amidinotransferase family. This enzyme is involved in creatine biosynthesis, whereby it catalyzes the transfer of a guanido group from L-arginine to glycine, resulting in guanidinoacetic acid, the immediate precursor of creatine. Mutations in this gene cause arginine:glycine amidinotransferase deficiency, an inborn error of creatine synthesis characterized by mental retardation, language impairment, and behavioral disorders. NA
CELF2 10659 ENSG00000048740 CUGBP, Elav-like family member 2 Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
PDPN 10630 ENSG00000162493 podoplanin This gene encodes a type-I integral membrane glycoprotein with diverse distribution in human tissues. The physiological function of this protein may be related to its mucin-type character. The homologous protein in other species has been described as a differentiation antigen and influenza-virus receptor. The specific function of this protein has not been determined but it has been proposed as a marker of lung injury. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
HDHD3 81932 ENSG00000119431 haloacid dehalogenase like hydrolase domain containing 3 NA NA
CLDN11 5010 ENSG00000013297 claudin 11 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. The protein encoded by this gene is a major component of central nervous system (CNS) myelin and plays an important role in regulating proliferation and migration of oligodendrocytes. Mouse studies showed that the gene deficiency results in deafness and loss of the Sertoli cell epithelial phenotype in the testis. This protein is a tight junction protein at the human blood-testis barrier (BTB), and the BTB disruption is related to a dysfunction of this gene. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query X_id name summary notfound
UBE2R2-AS1 ENSG00000235481 ENSG00000235481 UBE2R2 antisense RNA 1 NA NA
MYRF ENSG00000124920 745 myelin regulatory factor This gene encodes a transcription factor that is required for central nervous system myelination and may regulate oligodendrocyte differentiation. It is thought to act by increasing the expression of genes that effect myelin production but may also directly promote myelin gene expression. Loss of a similar gene in mouse models results in severe demyelination. Alternative splicing results in multiple transcript variants. NA
PRSS3 ENSG00000010438 5646 protease, serine 3 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. NA
GFAP ENSG00000131095 2670 glial fibrillary acidic protein This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
GIPR ENSG00000010310 2696 gastric inhibitory polypeptide receptor This gene encodes a G-protein coupled receptor for gastric inhibitory polypeptide (GIP), which was originally identified as an activity in gut extracts that inhibited gastric acid secretion and gastrin release, but subsequently was demonstrated to stimulate insulin release in the presence of elevated glucose. Mice lacking this gene exhibit higher blood glucose levels with impaired initial insulin response after oral glucose load. Defect in this gene thus may contribute to the pathogenesis of diabetes. NA
ALDOB ENSG00000136872 229 aldolase, fructose-bisphosphate B Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. NA
CMKLR1 ENSG00000174600 1240 chemerin chemokine-like receptor 1 NA NA
HAMP ENSG00000105697 57817 hepcidin antimicrobial peptide The product encoded by this gene is involved in the maintenance of iron homeostasis, and it is necessary for the regulation of iron storage in macrophages, and for intestinal iron absorption. The preproprotein is post-translationally cleaved into mature peptides of 20, 22 and 25 amino acids, and these active peptides are rich in cysteines, which form intramolecular bonds that stabilize their beta-sheet structures. These peptides exhibit antimicrobial activity against bacteria and fungi. Mutations in this gene cause hemochromatosis type 2B, also known as juvenile hemochromatosis, a disease caused by severe iron overload that results in cardiomyopathy, cirrhosis, and endocrine failure. NA
FBLL1 ENSG00000188573 ENSG00000188573 fibrillarin-like 1 NA NA
CLGN ENSG00000153132 1047 calmegin Calmegin is a testis-specific endoplasmic reticulum chaperone protein. CLGN may play a role in spermatogeneisis and infertility. NA
ASPHD1 ENSG00000174939 253982 aspartate beta-hydroxylase domain containing 1 NA NA
SLC16A9 ENSG00000165449 220963 solute carrier family 16 member 9 NA NA
ITGA8 ENSG00000077943 8516 integrin subunit alpha 8 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. NA
SNCB ENSG00000074317 6620 synuclein beta This gene encodes a member of a small family of proteins that inhibit phospholipase D2 and may function in neuronal plasticity. The encoded protein is abundant in lesions of patients with Alzheimer disease. A mutation in this gene was found in individuals with dementia with Lewy bodies. Alternative splicing results in multiple transcript variants. NA
AOC3 ENSG00000131471 8639 amine oxidase, copper containing 3 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. NA
PPP1R1B ENSG00000131771 84152 protein phosphatase 1 regulatory inhibitor subunit 1B This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. NA
MEF2C ENSG00000081189 4208 myocyte enhancer factor 2C This locus encodes a member of the MADS box transcription enhancer factor 2 (MEF2) family of proteins, which play a role in myogenesis. The encoded protein, MEF2 polypeptide C, has both trans-activating and DNA binding activities. This protein may play a role in maintaining the differentiated state of muscle cells. Mutations and deletions at this locus have been associated with severe mental retardation, stereotypic movements, epilepsy, and cerebral malformation. Alternatively spliced transcript variants have been described. NA
CTC-467M3.1 ENSG00000245864 ENSG00000245864 NA NA NA
RUNDC3A ENSG00000108309 10900 RUN domain containing 3A NA NA
PAIP2B ENSG00000124374 400961 poly(A) binding protein interacting protein 2B Most mRNAs, except for histones, contain a 3-prime poly(A) tail. Poly(A)-binding protein (PABP; see MIM 604679) enhances translation by circularizing mRNA through its interaction with the translation initiation factor EIF4G1 (MIM 600495) and the poly(A) tail. Various PABP-binding proteins regulate PABP activity, including PAIP1 (MIM 605184), a translational stimulator, and PAIP2A (MIM 605604) and PAIP2B, translational inhibitors (Derry et al., 2006 [PubMed 17381337]). NA
AC007036.5 ENSG00000251660 ENSG00000251660 NA NA NA
GATM ENSG00000171766 2628 glycine amidinotransferase This gene encodes a mitochondrial enzyme that belongs to the amidinotransferase family. This enzyme is involved in creatine biosynthesis, whereby it catalyzes the transfer of a guanido group from L-arginine to glycine, resulting in guanidinoacetic acid, the immediate precursor of creatine. Mutations in this gene cause arginine:glycine amidinotransferase deficiency, an inborn error of creatine synthesis characterized by mental retardation, language impairment, and behavioral disorders. NA
PDIA2 ENSG00000185615 64714 protein disulfide isomerase family A member 2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
ITPR1-AS1 ENSG00000231249 ENSG00000231249 ITPR1 antisense RNA 1 (head to head) NA NA
SMOX ENSG00000088826 54498 spermine oxidase Polyamines are ubiquitous polycationic alkylamines which include spermine, spermidine, putrescine, and agmatine. These molecules participate in a broad range of cellular functions which include cell cycle modulation, scavenging reactive oxygen species, and the control of gene expression. These molecules also play important roles in neurotransmission through their regulation of cell-surface receptor activity, involvement in intracellular signalling pathways, and their putative roles as neurotransmitters. This gene encodes an FAD-containing enzyme that catalyzes the oxidation of spermine to spermadine and secondarily produces hydrogen peroxide. Multiple transcript variants encoding different isoenzymes have been identified for this gene, some of which have failed to demonstrate significant oxidase activity on natural polyamine substrates. The characterized isoenzymes have distinctive biochemical characteristics and substrate specificities, suggesting the existence of additional levels of complexity in polyamine catabolism. NA
KRT8 ENSG00000170421 3856 keratin 8 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. NA
RP11-618K13.2 ENSG00000255498 ENSG00000255498 NA NA NA
RP11-862L9.3 ENSG00000266844 ENSG00000266844 NA NA NA
ITGA3 ENSG00000005884 3675 integrin subunit alpha 3 The gene encodes a member of the integrin alpha chain family of proteins. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain that function as cell surface adhesion molecules. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 3 subunit. This subunit joins with a beta 1 subunit to form an integrin that interacts with extracellular matrix proteins including members of the laminin family. Expression of this gene may be correlated with breast cancer metastasis. NA
C2orf82 ENSG00000182600 389084 chromosome 2 open reading frame 82 NA NA
NA ENSG00000165862 NA NA NA TRUE
MGP ENSG00000111341 4256 matrix Gla protein The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. NA
MYOZ1 ENSG00000177791 58529 myozenin 1 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. NA
ENHO ENSG00000168913 375704 energy homeostasis associated NA NA
RP11-248J18.2 ENSG00000269906 ENSG00000269906 NA NA NA
MTURN ENSG00000180354 222166 maturin, neural progenitor differentiation regulator homolog (Xenopus) NA NA
SLC47A1 ENSG00000142494 55244 solute carrier family 47 member 1 This gene is located within the Smith-Magenis syndrome region on chromosome 17. It encodes a protein of unknown function. NA
KRT7 ENSG00000135480 3855 keratin 7 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. NA
RP11-370I10.12 ENSG00000269514 ENSG00000269514 NA NA NA
MUC7 ENSG00000171195 4589 mucin 7, secreted This gene encodes a small salivary mucin, which is thought to play a role in facilitating the clearance of bacteria in the oral cavity and to aid in mastication, speech, and swallowing. The central domain of this glycoprotein contains tandem repeats, each composed of 23 amino acids. This antimicrobial protein has antibacterial and antifungal activity. The most common allele contains 6 repeats, and some alleles may be associated with susceptibility to asthma. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. NA
SLC37A2 ENSG00000134955 219855 solute carrier family 37 member 2 NA NA
NPR1 ENSG00000169418 4881 natriuretic peptide receptor 1 Guanylyl cyclases, catalyzing the production of cGMP from GTP, are classified as soluble and membrane forms (Garbers and Lowe, 1994 [PubMed 7982997]). The membrane guanylyl cyclases, often termed guanylyl cyclases A through F, form a family of cell-surface receptors with a similar topographic structure: an extracellular ligand-binding domain, a single membrane-spanning domain, and an intracellular region that contains a protein kinase-like domain and a cyclase catalytic domain. GC-A and GC-B function as receptors for natriuretic peptides; they are also referred to as atrial natriuretic peptide receptor A (NPR1) and type B (NPR2; MIM 108961). Also see NPR3 (MIM 108962), which encodes a protein with only the ligand-binding transmembrane and 37-amino acid cytoplasmic domains. NPR1 is a membrane-bound guanylate cyclase that serves as the receptor for both atrial and brain natriuretic peptides (ANP (MIM 108780) and BNP (MIM 600295), respectively). NA
IL15 ENSG00000164136 3600 interleukin 15 The protein encoded by this gene is a cytokine that regulates T and natural killer cell activation and proliferation. This cytokine and interleukine 2 share many biological activities. They are found to bind common hematopoietin receptor subunits, and may compete for the same receptor, and thus negatively regulate each other’s activity. The number of CD8+ memory cells is shown to be controlled by a balance between this cytokine and IL2. This cytokine induces the activation of JAK kinases, as well as the phosphorylation and activation of transcription activators STAT3, STAT5, and STAT6. Studies of the mouse counterpart suggested that this cytokine may increase the expression of apoptosis inhibitor BCL2L1/BCL-x(L), possibly through the transcription activation activity of STAT6, and thus prevent apoptosis. Alternatively spliced transcript variants of this gene have been reported. NA
NEURL1 ENSG00000107954 9148 neuralized E3 ubiquitin protein ligase 1 NA NA
PELI2 ENSG00000139946 57161 pellino E3 ubiquitin protein ligase family member 2 NA NA
CXCL1 ENSG00000163739 2919 C-X-C motif chemokine ligand 1 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. NA
ARSG ENSG00000141337 22901 arylsulfatase G The protein encoded by this gene belongs to the sulfatase enzyme family. Sulfatases hydrolyze sulfate esters from sulfated steroids, carbohydrates, proteoglycans, and glycolipids. They are involved in hormone biosynthesis, modulation of cell signaling, and degradation of macromolecules. This protein displays arylsulfatase activity at acidic pH, as is typical of lysosomal sulfatases, and has been shown to localize in the lysosomes. Alternatively spliced transcript variants have been found for this gene. NA
PTX3 ENSG00000163661 5806 pentraxin 3 NA NA
TBXA2R ENSG00000006638 6915 thromboxane A2 receptor This gene encodes a member of the G protein-coupled receptor family. The protein interacts with thromboxane A2 to induce platelet aggregation and regulate hemostasis. A mutation in this gene results in a bleeding disorder. Multiple transcript variants encoding different isoforms have been found for this gene. NA
COL4A4 ENSG00000081052 1286 collagen type IV alpha 4 chain This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR. NA
TMPRSS5 ENSG00000166682 80975 transmembrane protease, serine 5 This gene encodes a protein that belongs to the serine protease family. Serine proteases are known to be involved in many physiological and pathological processes. Alternative splicing results in multiple transcript variants. NA
TTR ENSG00000118271 7276 transthyretin This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. NA
SLC7A5 ENSG00000103257 8140 solute carrier family 7 member 5 NA NA
PKP2 ENSG00000057294 5318 plakophilin 2 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. NA
LRG1 ENSG00000171236 116844 leucine rich alpha-2-glycoprotein 1 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). NA
ADAMTS7 ENSG00000136378 11173 ADAM metallopeptidase with thrombospondin type 1 motif 7 The protein encoded by this gene is a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motifs) family. Members of this family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme contains two C-terminal TS motifs and may regulate vascular smooth muscle cell (VSMC) migration. Mutations in this gene may be associated with susceptibility to coronary artery disease. NA
AZGP1 ENSG00000160862 563 alpha-2-glycoprotein 1, zinc-binding NA NA
GRK5 ENSG00000198873 2869 G protein-coupled receptor kinase 5 This gene encodes a member of the guanine nucleotide-binding protein (G protein)-coupled receptor kinase subfamily of the Ser/Thr protein kinase family. The protein phosphorylates the activated forms of G protein-coupled receptors thus initiating their deactivation. It has also been shown to play a role in regulating the motility of polymorphonuclear leukocytes (PMNs). NA
MYH10 ENSG00000133026 4628 myosin, heavy chain 10, non-muscle This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. NA
CHGA ENSG00000100604 1113 chromogranin A The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. NA
RP11-169K16.4 ENSG00000224459 ENSG00000224459 NA NA NA
AKR7A3 ENSG00000162482 22977 aldo-keto reductase family 7 member A3 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. NA
CDH1 ENSG00000039068 999 cadherin 1 This gene encodes a classical cadherin of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature glycoprotein. This calcium-dependent cell-cell adhesion protein is comprised of five extracellular cadherin repeats, a transmembrane region and a highly conserved cytoplasmic tail. Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis. The ectodomain of this protein mediates bacterial adhesion to mammalian cells and the cytoplasmic domain is required for internalization. This gene is present in a gene cluster with other members of the cadherin family on chromosome 16. NA
CHI3L1 ENSG00000133048 1116 chitinase 3 like 1 Chitinases catalyze the hydrolysis of chitin, which is an abundant glycopolymer found in insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 family of chitinases includes eight human family members. This gene encodes a glycoprotein member of the glycosyl hydrolase 18 family. The protein lacks chitinase activity and is secreted by activated macrophages, chondrocytes, neutrophils and synovial cells. The protein is thought to play a role in the process of inflammation and tissue remodeling. NA
TG ENSG00000042832 7038 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
MAPK8IP1 ENSG00000121653 9479 mitogen-activated protein kinase 8 interacting protein 1 This gene encodes a regulator of the pancreatic beta-cell function. It is highly similar to JIP-1, a mouse protein known to be a regulator of c-Jun amino-terminal kinase (Mapk8). This protein has been shown to prevent MAPK8 mediated activation of transcription factors, and to decrease IL-1 beta and MAP kinase kinase 1 (MEKK1) induced apoptosis in pancreatic beta cells. This protein also functions as a DNA-binding transactivator of the glucose transporter GLUT2. RE1-silencing transcription factor (REST) is reported to repress the expression of this gene in insulin-secreting beta cells. This gene is found to be mutated in a type 2 diabetes family, and thus is thought to be a susceptibility gene for type 2 diabetes. NA
FAM134B ENSG00000154153 54463 family with sequence similarity 134 member B The protein encoded by this gene is a cis-Golgi transmembrane protein that may be necessary for the long-term survival of nociceptive and autonomic ganglion neurons. Mutations in this gene are a cause of hereditary sensory and autonomic neuropathy type IIB (HSAN IIB), and this gene may also play a role in susceptibility to vascular dementia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
GAS5 ENSG00000234741 60674 growth arrest specific 5 (non-protein coding) This gene produces a spliced long non-coding RNA and is a member of the 5’ terminal oligo-pyrimidine class of genes. It is a small nucleolar RNA host gene, containing multiple C/D box snoRNA genes in its introns. Part of the secondary RNA structure of the encoded transcript mimics glucocorticoid response element (GRE) which means it can bind to the DNA binding domain of the glucocorticoid receptor (nuclear receptor subfamily 3, group C, member 1). This action blocks the glucocorticoid receptor from being activated and thereby stops it from regulating the transcription of its target genes. This transcript is also thought to regulate the transcriptional activity of other receptors, such as androgen, progesterone and mineralocorticoid receptors, that can bind to its GRE mimic region. Multiple functions have been associated with this transcript, including cellular growth arrest and apoptosis. It has also been identified as a potential tumor suppressor, with its down-regulation associated with cancer in multiple different tissues. NA
AGT ENSG00000135744 183 angiotensinogen The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. NA
KIF1A ENSG00000130294 547 kinesin family member 1A The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. NA
IGHG1 ENSG00000211896 ENSG00000211896 immunoglobulin heavy constant gamma 1 (G1m marker) NA NA
RDH10 ENSG00000121039 157506 retinol dehydrogenase 10 (all-trans) This gene encodes a retinol dehydrogenase, which converts all-trans-retinol to all-trans-retinal, with preference for NADP as a cofactor. Studies in mice suggest that this protein is essential for synthesis of embryonic retinoic acid and is required for limb, craniofacial, and organ development. NA
ABCA1 ENSG00000165029 19 ATP binding cassette subfamily A member 1 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. With cholesterol as its substrate, this protein functions as a cholesteral efflux pump in the cellular lipid removal pathway. Mutations in this gene have been associated with Tangier’s disease and familial high-density lipoprotein deficiency. NA
PSD2 ENSG00000146005 84249 pleckstrin and Sec7 domain containing 2 NA NA
NA ENSG00000203306 NA NA NA TRUE
SLC18A2 ENSG00000165646 6571 solute carrier family 18 member A2 The vesicular monoamine transporter acts to accumulate cytosolic monoamines into synaptic vesicles, using the proton gradient maintained across the synaptic vesicular membrane. Its proper function is essential to the correct activity of the monoaminergic systems that have been implicated in several human neuropsychiatric disorders. The transporter is a site of action of important drugs, including reserpine and tetrabenazine (summary by Peter et al., 1993 [PubMed 7905859]). See also SLC18A1 (MIM 193002). NA
INMT ENSG00000241644 11185 indolethylamine N-methyltransferase N-methylation of endogenous and xenobiotic compounds is a major method by which they are degraded. This gene encodes an enzyme that N-methylates indoles such as tryptamine. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the downstream FAM188B (family with sequence similarity 188, member B) gene. NA
INSL3 ENSG00000248099 3640 insulin like 3 This gene encodes a member of the insulin-like hormone superfamily. The encoded protein is mainly produced in gonadal tissues. Studies of the mouse counterpart suggest that this gene may be involved in the development of urogenital tract and female fertility. This protein may also act as a hormone to regulate growth and differentiation of gubernaculum, and thus mediating intra-abdominal testicular descent. Mutations in this gene may lead to cryptorchidism. Alternate splicing results in multiple transcript variants. NA
IGLL5 ENSG00000254709 100423062 immunoglobulin lambda like polypeptide 5 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. NA
CTRL ENSG00000141086 1506 chymotrypsin like NA NA
EME2 ENSG00000197774 197342 essential meiotic structure-specific endonuclease subunit 2 EME2 forms a heterodimer with MUS81 (MIM 606591) that functions as an XPF (MIM 278760)-type flap/fork endonuclease in DNA repair (Ciccia et al., 2007 [PubMed 17289582]). NA
ITPR1 ENSG00000150995 3708 inositol 1,4,5-trisphosphate receptor type 1 This gene encodes an intracellular receptor for inositol 1,4,5-trisphosphate. Upon stimulation by inositol 1,4,5-trisphosphate, this receptor mediates calcium release from the endoplasmic reticulum. Mutations in this gene cause spinocerebellar ataxia type 15, a disease associated with an heterogeneous group of cerebellar disorders. Multiple transcript variants have been identified for this gene. NA
LCN2 ENSG00000148346 3934 lipocalin 2 This gene encodes a protein that belongs to the lipocalin family. Members of this family transport small hydrophobic molecules such as lipids, steroid hormones and retinoids. The protein encoded by this gene is a neutrophil gelatinase-associated lipocalin and plays a role in innate immunity by limiting bacterial growth as a result of sequestering iron-containing siderophores. The presence of this protein in blood and urine is an early biomarker of acute kidney injury. This protein is thought to be be involved in multiple cellular processes, including maintenance of skin homeostasis, and suppression of invasiveness and metastasis. Mice lacking this gene are more susceptible to bacterial infection than wild type mice. NA
ASRGL1 ENSG00000162174 80150 asparaginase like 1 NA NA
ANO1-AS1 ENSG00000254902 ENSG00000254902 ANO1 antisense RNA 1 NA NA
VSNL1 ENSG00000163032 7447 visinin like 1 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. NA
MYCL ENSG00000116990 4610 v-myc avian myelocytomatosis viral oncogene lung carcinoma derived homolog NA NA
HSPB7 ENSG00000173641 27129 heat shock protein family B (small) member 7 NA NA
HS3ST3B1 ENSG00000125430 9953 heparan sulfate-glucosamine 3-sulfotransferase 3B1 The protein encoded by this gene is a type II integral membrane protein that belongs to the 3-O-sulfotransferases family. These proteins catalyze the addition of sulfate groups at the 3-OH position of glucosamine in heparan sulfate. The substrate specificity of individual members of the family is based on prior modification of the heparan sulfate chain, thus allowing different members of the family to generate binding sites for different proteins on the same heparan sulfate chain. Following treatment with a histone deacetylase inhibitor, expression of this gene is activated in a pancreatic cell line. The increased expression results in promotion of the epithelial-mesenchymal transition. In addition, the modification catalyzed by this protein allows herpes simplex virus membrane fusion and penetration. A very closely related homolog with an almost identical sulfotransferase domain maps less than 1 Mb away. Alternative splicing results in multiple transcript variants. NA
MIEF2 ENSG00000177427 125170 mitochondrial elongation factor 2 This gene encodes an outer mitochondrial membrane protein that functions in the regulation of mitochondrial morphology. It can directly recruit the fission mediator dynamin-related protein 1 (Drp1) to the mitochondrial surface. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
TPO ENSG00000115705 7173 thyroid peroxidase This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. NA
SYCE1 ENSG00000171772 93426 synaptonemal complex central element protein 1 NA NA
CUZD1 ENSG00000138161 50624 CUB and zona pellucida like domains 1 NA NA
LDB3 ENSG00000122367 11155 LIM domain binding 3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. NA
NXPH3 ENSG00000182575 11248 neurexophilin 3 NA NA
NDRG1 ENSG00000104419 10397 N-myc downstream regulated 1 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
GMNN ENSG00000112312 51053 geminin, DNA replication inhibitor This gene encodes a protein that plays a critical role in cell cycle regulation. The encoded protein inhibits DNA replication by binding to DNA replication factor Cdt1, preventing the incorporation of minichromosome maintenance proteins into the pre-replication complex. The encoded protein is expressed during the S and G2 phases of the cell cycle and is degraded by the anaphase-promoting complex during the metaphase-anaphase transition. Increased expression of this gene may play a role in several malignancies including colon, rectal and breast cancer. Alternatively spliced transcript variants have been observed for this gene, and two pseudogenes of this gene are located on the short arm of chromosome 16. NA
CNIH2 ENSG00000174871 254263 cornichon family AMPA receptor auxiliary protein 2 The protein encoded by this gene is an auxiliary subunit of the ionotropic glutamate receptor of the AMPA subtype. AMPA receptors mediate fast synaptic neurotransmission in the central nervous system. This protein has been reported to interact with the Type I AMPA receptor regulatory protein isoform gamma-8 to control assembly of hippocampal AMPA receptor complexes, thereby modulating receptor gating and pharmacology. Alternative splicing results in multiple transcript variants. NA
EEF1B2P2 ENSG00000213864 ENSG00000213864 eukaryotic translation elongation factor 1 beta 2 pseudogene 2 NA NA
FZD1 ENSG00000157240 8321 frizzled class receptor 1 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD1 protein contains a signal peptide, a cysteine-rich domain in the N-terminal extracellular region, 7 transmembrane domains, and a C-terminal PDZ domain-binding motif. The FZD1 transcript is expressed in various tissues. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id symbol query summary notfound
joining chain of multimeric IgA and IgM 3512 JCHAIN ENSG00000132465 NA NA
immunoglobulin heavy constant mu ENSG00000211899 IGHM ENSG00000211899 NA NA
immunoglobulin lambda constant 2 (Kern-Oz- marker) ENSG00000211677 IGLC2 ENSG00000211677 NA NA
immunoglobulin lambda constant 3 (Kern-Oz+ marker) ENSG00000211679 IGLC3 ENSG00000211679 NA NA
immunoglobulin lambda like polypeptide 5 100423062 IGLL5 ENSG00000254709 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. NA
RUN domain containing 3A 10900 RUNDC3A ENSG00000108309 NA NA
immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 IGLC1 ENSG00000211675 NA NA
placenta specific 8 51316 PLAC8 ENSG00000145287 NA NA
uncharacterized LOC100507387 100507387 LOC100507387 ENSG00000182230 NA NA
family with sequence similarity 153 member B 202134 FAM153B ENSG00000182230 NA NA
immunoglobulin heavy constant alpha 1 ENSG00000211895 IGHA1 ENSG00000211895 NA NA
immunoglobulin heavy constant gamma 2 (G2m marker) ENSG00000211893 IGHG2 ENSG00000211893 NA NA
chromosome 2 open reading frame 66 401027 C2orf66 ENSG00000187944 NA NA
NA ENSG00000253364 RP11-731F5.2 ENSG00000253364 NA NA
CD79a molecule 973 CD79A ENSG00000105369 The B lymphocyte antigen receptor is a multimeric complex that includes the antigen-specific component, surface immunoglobulin (Ig). Surface Ig non-covalently associates with two other proteins, Ig-alpha and Ig-beta, which are necessary for expression and function of the B-cell antigen receptor. This gene encodes the Ig-alpha protein of the B-cell antigen component. Alternatively spliced transcript variants encoding different isoforms have been described. NA
ubiquitin conjugating enzyme E2 C 11065 UBE2C ENSG00000175063 The modification of proteins with ubiquitin is an important cellular mechanism for targeting abnormal or short-lived proteins for degradation. Ubiquitination involves at least three classes of enzymes: ubiquitin-activating enzymes, ubiquitin-conjugating enzymes, and ubiquitin-protein ligases. This gene encodes a member of the E2 ubiquitin-conjugating enzyme family. The encoded protein is required for the destruction of mitotic cyclins and for cell cycle progression, and may be involved in cancer progression. Multiple transcript variants encoding different isoforms have been found for this gene. Pseudogenes of this gene have been defined on chromosomes 4, 14, 15, 18, and 19. NA
kinesin family member C3 3801 KIFC3 ENSG00000140859 This gene encodes a member of the kinesin-14 family of microtubule motors. Members of this family play a role in the formation, maintenance and remodeling of the bipolar mitotic spindle. The protein encoded by this gene has cytoplasmic functions in the interphase cells. It may also be involved in the final stages of cytokinesis. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
RAP1 GTPase activating protein 5909 RAP1GAP ENSG00000076864 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. NA
non-SMC condensin I complex subunit H 23397 NCAPH ENSG00000121152 This gene encodes a member of the barr gene family and a regulatory subunit of the condensin complex. This complex is required for the conversion of interphase chromatin into condensed chromosomes. The protein encoded by this gene is associated with mitotic chromosomes, except during the early phase of chromosome condensation. During interphase, the protein has a distinct punctate nucleolar localization. Alternatively spliced transcript variants encoding different proteins have been described. NA
immunoglobulin heavy constant gamma 3 (G3m marker) ENSG00000211897 IGHG3 ENSG00000211897 NA NA
immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896 IGHG1 ENSG00000211896 NA NA
MAD2 mitotic arrest deficient-like 1 (yeast) 4085 MAD2L1 ENSG00000164109 MAD2L1 is a component of the mitotic spindle assembly checkpoint that prevents the onset of anaphase until all chromosomes are properly aligned at the metaphase plate. MAD2L1 is related to the MAD2L2 gene located on chromosome 1. A MAD2 pseudogene has been mapped to chromosome 14. NA
nucleolar and spindle associated protein 1 51203 NUSAP1 ENSG00000137804 NUSAP1 is a nucleolar-spindle-associated protein that plays a role in spindle microtubule organization (Raemaekers et al., 2003 [PubMed 12963707]). NA
phospholipase D family member 4 122618 PLD4 ENSG00000166428 NA NA
CD22 molecule 933 CD22 ENSG00000012124 NA NA
NA NA NA ENSG00000256390 NA TRUE
C-X-C motif chemokine ligand 14 9547 CXCL14 ENSG00000145824 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. NA
proopiomelanocortin 5443 POMC ENSG00000115138 This gene encodes a preproprotein that undergoes extensive, tissue-specific, post-translational processing via cleavage by subtilisin-like enzymes known as prohormone convertases. There are eight potential cleavage sites within the preproprotein and, depending on tissue type and the available convertases, processing may yield as many as ten biologically active peptides involved in diverse cellular functions. The encoded protein is synthesized mainly in corticotroph cells of the anterior pituitary where four cleavage sites are used; adrenocorticotrophin, essential for normal steroidogenesis and the maintenance of normal adrenal weight, and lipotropin beta are the major end products. In other tissues, including the hypothalamus, placenta, and epithelium, all cleavage sites may be used, giving rise to peptides with roles in pain and energy homeostasis, melanocyte stimulation, and immune modulation. These include several distinct melanotropins, lipotropins, and endorphins that are contained within the adrenocorticotrophin and beta-lipotropin peptides. The antimicrobial melanotropin alpha peptide exhibits antibacterial and antifungal activity. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation. Alternatively spliced transcript variants encoding the same protein have been described. NA
dedicator of cytokinesis 10 55619 DOCK10 ENSG00000135905 This gene encodes a member of the dedicator of cytokinesis protein family. Members of this family are guanosine nucleotide exchange factors for Rho GTPases and defined by the presence of conserved DOCK-homology regions. The encoded protein belongs to the D (or Zizimin) subfamily of DOCK proteins, which also contain an N-terminal pleckstrin homology domain. Alternatively spliced transcript variants that encode different isoforms have been described. NA
TPX2, microtubule nucleation factor 22974 TPX2 ENSG00000088325 NA NA
NA ENSG00000223353 RP11-290P14.2 ENSG00000223353 NA NA
cell division cycle associated 3 83461 CDCA3 ENSG00000111665 NA NA
mucin like 1 118430 MUCL1 ENSG00000172551 NA NA
cholinergic receptor nicotinic epsilon subunit 1145 CHRNE ENSG00000108556 Acetylcholine receptors at mature mammalian neuromuscular junctions are pentameric protein complexes composed of four subunits in the ratio of two alpha subunits to one beta, one epsilon, and one delta subunit. The acetylcholine receptor changes subunit composition shortly after birth when the epsilon subunit replaces the gamma subunit seen in embryonic receptors. Mutations in the epsilon subunit are associated with congenital myasthenic syndrome. NA
RAD51 recombinase 5888 RAD51 ENSG00000051180 The protein encoded by this gene is a member of the RAD51 protein family. RAD51 family members are highly similar to bacterial RecA and Saccharomyces cerevisiae Rad51, and are known to be involved in the homologous recombination and repair of DNA. This protein can interact with the ssDNA-binding protein RPA and RAD52, and it is thought to play roles in homologous pairing and strand transfer of DNA. This protein is also found to interact with BRCA1 and BRCA2, which may be important for the cellular response to DNA damage. BRCA2 is shown to regulate both the intracellular localization and DNA-binding ability of this protein. Loss of these controls following BRCA2 inactivation may be a key event leading to genomic instability and tumorigenesis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
leucine rich repeat containing 73 221424 LRRC73 ENSG00000204052 NA NA
ankyrin repeat and SOCS box containing 2 51676 ASB2 ENSG00000100628 This gene encodes a member of the ankyrin repeat and SOCS box-containing (ASB) protein family. These proteins play a role in protein degradation by coupling suppressor of cytokine signalling (SOCS) proteins with the elongin BC complex. The encoded protein is a subunit of a multimeric E3 ubiquitin ligase complex that mediates the degradation of actin-binding proteins. This gene plays a role in retinoic acid-induced growth inhibition and differentiation of myeloid leukemia cells. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
WD repeat domain 66 144406 WDR66 ENSG00000158023 This protein encoded by this gene belongs to the WD repeat-containing family of proteins, which function in the formation of protein-protein complexes in a variety of biological pathways. This family member appears to function in the determination of mean platelet volume (MPV), and polymorphisms in this gene have been associated with variance in MPV. Alternative splicing of this gene results in multiple transcript variants. NA
cell division cycle associated 5 113130 CDCA5 ENSG00000146670 NA NA
zinc finger C2HC-type containing 1C 79696 ZC2HC1C ENSG00000119703 NA NA
GINS complex subunit 2 51659 GINS2 ENSG00000131153 The yeast heterotetrameric GINS complex is made up of Sld5 (GINS4; MIM 610611), Psf1 (GINS1; MIM 610608), Psf2, and Psf3 (GINS3; MIM 610610). The formation of this complex is essential for the initiation of DNA replication in yeast and Xenopus egg extracts (Ueno et al., 2005 [PubMed 16287864]). See GINS1 for additional information about the GINS complex. NA
CATIP antisense RNA 1 ENSG00000225062 CATIP-AS1 ENSG00000225062 NA NA
family with sequence similarity 153 member C ENSG00000204677 FAM153C ENSG00000204677 NA NA
thyroglobulin 7038 TG ENSG00000042832 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
protein tyrosine phosphatase, receptor type O 5800 PTPRO ENSG00000151490 This gene encodes a member of the R3 subtype family of receptor-type protein tyrosine phosphatases. These proteins are localized to the apical surface of polarized cells and may have tissue-specific functions through activation of Src family kinases. This gene contains two distinct promoters, and alternatively spliced transcript variants encoding multiple isoforms have been observed. The encoded proteins may have multiple isoform-specific and tissue-specific functions, including the regulation of osteoclast production and activity, inhibition of cell proliferation and facilitation of apoptosis. This gene is a candidate tumor suppressor, and decreased expression of this gene has been observed in several types of cancer. NA
immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 IGHA2 ENSG00000211890 NA NA
glycoprotein hormones, alpha polypeptide 1081 CGA ENSG00000135346 The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. NA
tachykinin receptor 2 6865 TACR2 ENSG00000075073 This gene belongs to a family of genes that function as receptors for tachykinins. Receptor affinities are specified by variations in the 5’-end of the sequence. The receptors belonging to this family are characterized by interactions with G proteins and 7 hydrophobic transmembrane regions. This gene encodes the receptor for the tachykinin neuropeptide substance K, also referred to as neurokinin A. NA
Opa interacting protein 5 11339 OIP5 ENSG00000104147 The protein encoded by this gene localizes to centromeres, where it is essential for recruitment of CENP-A through the mediator Holliday junction recognition protein. Expression of this gene is upregulated in several cancers, making it a putative therapeutic target. Two transcript variants encoding different isoforms have been found for this gene. NA
MOK protein kinase 5891 MOK ENSG00000080823 This gene belongs to the MAP kinase superfamily. The gene was found to be regulated by caudal type transcription factor 2 (Cdx2) protein. The encoded protein, which is localized to epithelial cells in the intestinal crypt, may play a role in growth arrest and differentiation of cells of upper crypt and lower villus regions. Multiple alternatively spliced transcript variants encoding different isoforms have been observed for this gene. NA
melanocortin 1 receptor 4157 MC1R ENSG00000258839 This intronless gene encodes the receptor protein for melanocyte-stimulating hormone (MSH). The encoded protein, a seven pass transmembrane G protein coupled receptor, controls melanogenesis. Two types of melanin exist: red pheomelanin and black eumelanin. Gene mutations that lead to a loss in function are associated with increased pheomelanin production, which leads to lighter skin and hair color. Eumelanin is photoprotective but pheomelanin may contribute to UV-induced skin damage by generating free radicals upon UV radiation. Binding of MSH to its receptor activates the receptor and stimulates eumelanin synthesis. This receptor is a major determining factor in sun sensitivity and is a genetic risk factor for melanoma and non-melanoma skin cancer. Over 30 variant alleles have been identified which correlate with skin and hair color, providing evidence that this gene is an important component in determining normal human pigment variation. NA
homer scaffolding protein 2 9455 HOMER2 ENSG00000103942 This gene encodes a member of the homer family of dendritic proteins. Members of this family regulate group 1 metabotrophic glutamate receptor function. The encoded protein is a postsynaptic density scaffolding protein. Alternative splicing results in multiple transcript variants. Two related pseudogenes have been identified on chromosome 14. NA
immunoglobulin superfamily member 22 283284 IGSF22 ENSG00000179057 NA NA
dynein regulatory complex subunit 3 83450 DRC3 ENSG00000171962 NA NA
centromere protein K 64105 CENPK ENSG00000123219 CENPK is a subunit of a CENPH (MIM 605607)-CENPI (MIM 300065)-associated centromeric complex that targets CENPA (MIM 117139) to centromeres and is required for proper kinetochore function and mitotic progression (Okada et al., 2006 [PubMed 16622420]). NA
meiotic nuclear divisions 1 84057 MND1 ENSG00000121211 The product of the MND1 gene associates with HOP2 (MIM 608665) to form a stable heterodimeric complex that binds DNA and stimulates the recombinase activity of RAD51 (MIM 179617) and DMC1 (MIM 602721) (Chi et al., 2007 [PubMed 17639080]). Both the MND1 and HOP2 genes are indispensable for meiotic recombination. NA
transmembrane protein 67 91147 TMEM67 ENSG00000164953 The protein encoded by this gene localizes to the primary cilium and to the plasma membrane. The gene functions in centriole migration to the apical membrane and formation of the primary cilium. Multiple transcript variants encoding different isoforms have been found for this gene. Defects in this gene are a cause of Meckel syndrome type 3 (MKS3) and Joubert syndrome type 6 (JBTS6). NA
nuclear receptor subfamily 6 group A member 1 2649 NR6A1 ENSG00000148200 This gene encodes an orphan nuclear receptor which is a member of the nuclear hormone receptor family. Its expression pattern suggests that it may be involved in neurogenesis and germ cell development. The protein can homodimerize and bind DNA, but in vivo targets have not been identified. Alternate splicing results in multiple transcript variants. NA
NA NA NA ENSG00000034063 NA TRUE
oxysterol binding protein 2 23762 OSBP2 ENSG00000184792 The protein encoded by this gene contains a pleckstrin homology (PH) domain and an oxysterol-binding region. It binds oxysterols such as 7-ketocholesterol and may inhibit their cytotoxicity. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
NA ENSG00000250654 RP11-834C11.7 ENSG00000250654 NA NA
NA NA NA ENSG00000234603 NA TRUE
cell division cycle associated 8 55143 CDCA8 ENSG00000134690 This gene encodes a component of the chromosomal passenger complex. This complex is an essential regulator of mitosis and cell division. This protein is cell-cycle regulated and is required for chromatin-induced microtubule stabilization and spindle formation. Alternate splicing results in multiple transcript variants. Pseudgenes of this gene are found on chromosomes 7, 8 and 16. NA
ZNF667 antisense RNA 1 (head to head) ENSG00000166770 ZNF667-AS1 ENSG00000166770 NA NA
calmegin 1047 CLGN ENSG00000153132 Calmegin is a testis-specific endoplasmic reticulum chaperone protein. CLGN may play a role in spermatogeneisis and infertility. NA
cilia and flagella associated protein 70 118491 CFAP70 ENSG00000156042 NA NA
NA NA NA ENSG00000260655 NA TRUE
zinc finger protein 667 63934 ZNF667 ENSG00000198046 NA NA
thymidine kinase 1 7083 TK1 ENSG00000167900 NA NA
aldolase, fructose-bisphosphate B 229 ALDOB ENSG00000136872 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. NA
dpy-19 like 2 283417 DPY19L2 ENSG00000177990 The protein encoded by this gene belongs to the dpy-19 family. It is highly expressed in testis, and is required for sperm head elongation and acrosome formation during spermatogenesis. Mutations in this gene are associated with an infertility disorder, spermatogenic failure type 9 (SPGF9). NA
leukemia inhibitory factor 3976 LIF ENSG00000128342 The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2) 5880 RAC2 ENSG00000128340 This gene encodes a member of the Ras superfamily of small guanosine triphosphate (GTP)-metabolizing proteins. The encoded protein localizes to the plasma membrane, where it regulates diverse processes, such as secretion, phagocytosis, and cell polarization. Activity of this protein is also involved in the generation of reactive oxygen species. Mutations in this gene are associated with neutrophil immunodeficiency syndrome. There is a pseudogene for this gene on chromosome 6. NA
PARP1 binding protein 55010 PARPBP ENSG00000185480 NA NA
family with sequence similarity 71 member F2 346653 FAM71F2 ENSG00000205085 NA NA
nectin cell adhesion molecule 2 5819 NECTIN2 ENSG00000130202 This gene encodes a single-pass type I membrane glycoprotein with two Ig-like C2-type domains and an Ig-like V-type domain. This protein is one of the plasma membrane components of adherens junctions. It also serves as an entry for certain mutant strains of herpes simplex virus and pseudorabies virus, and it is involved in cell to cell spreading of these viruses. Variations in this gene have been associated with differences in the severity of multiple sclerosis. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
RAD51 associated protein 1 10635 RAD51AP1 ENSG00000111247 NA NA
polo like kinase 1 5347 PLK1 ENSG00000166851 The Ser/Thr protein kinase encoded by this gene belongs to the CDC5/Polo subfamily. It is highly expressed during mitosis and elevated levels are found in many different types of cancer. Depletion of this protein in cancer cells dramatically inhibited cell proliferation and induced apoptosis; hence, it is a target for cancer therapy. NA
calcium/calmodulin dependent protein kinase II beta 816 CAMK2B ENSG00000058404 The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. NA
deoxyribonuclease I-like 2 1775 DNASE1L2 ENSG00000167968 NA NA
CD48 molecule 962 CD48 ENSG00000117091 This gene encodes a member of the CD2 subfamily of immunoglobulin-like receptors which includes SLAM (signaling lymphocyte activation molecules) proteins. The encoded protein is found on the surface of lymphocytes and other immune cells, dendritic cells and endothelial cells, and participates in activation and differentiation pathways in these cells. The encoded protein does not have a transmembrane domain, however, but is held at the cell surface by a GPI anchor via a C-terminal domain which maybe cleaved to yield a soluble form of the receptor. Multiple transcript variants encoding different isoforms have been found for this gene. NA
kinesin family member 15 56992 KIF15 ENSG00000163808 NA NA
cyclin A2 890 CCNA2 ENSG00000145386 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. In contrast to cyclin A1, which is present only in germ cells, this cyclin is expressed in all tissues tested. This cyclin binds and activates CDC2 or CDK2 kinases, and thus promotes both cell cycle G1/S and G2/M transitions. NA
alkaline phosphatase, liver/bone/kidney 249 ALPL ENSG00000162551 This gene encodes a member of the alkaline phosphatase family of proteins. There are at least four distinct but related alkaline phosphatases: intestinal, placental, placental-like, and liver/bone/kidney (tissue non-specific). The first three are located together on chromosome 2, while the tissue non-specific form is located on chromosome 1. The product of this gene is a membrane bound glycosylated enzyme that is not expressed in any particular tissue and is, therefore, referred to as the tissue-nonspecific form of the enzyme. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature enzyme. This enzyme may play a role in bone mineralization. Mutations in this gene have been linked to hypophosphatasia, a disorder that is characterized by hypercalcemia and skeletal defects. NA
natriuretic peptide B 4879 NPPB ENSG00000120937 This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein’s biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. NA
eukaryotic translation elongation factor 1 alpha 2 1917 EEF1A2 ENSG00000101210 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. NA
lymphotoxin beta 4050 LTB ENSG00000227507 Lymphotoxin beta is a type II membrane protein of the TNF family. It anchors lymphotoxin-alpha to the cell surface through heterotrimer formation. The predominant form on the lymphocyte surface is the lymphotoxin-alpha 1/beta 2 complex (e.g. 1 molecule alpha/2 molecules beta) and this complex is the primary ligand for the lymphotoxin-beta receptor. The minor complex is lymphotoxin-alpha 2/beta 1. LTB is an inducer of the inflammatory response system and involved in normal development of lymphoid tissue. Lymphotoxin-beta isoform b is unable to complex with lymphotoxin-alpha suggesting a function for lymphotoxin-beta which is independent of lympyhotoxin-alpha. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ATPase phospholipid transporting 11A 23250 ATP11A ENSG00000068650 The protein encoded by this gene is an integral membrane ATPase. The encoded protein is probably phosphorylated in its intermediate state and likely drives the transport of ions such as calcium across membranes. Two transcript variants encoding different isoforms have been found for this gene. NA
NAD(P)H quinone dehydrogenase 1 1728 NQO1 ENSG00000181019 This gene is a member of the NAD(P)H dehydrogenase (quinone) family and encodes a cytoplasmic 2-electron reductase. This FAD-binding protein forms homodimers and reduces quinones to hydroquinones. This protein’s enzymatic activity prevents the one electron reduction of quinones that results in the production of radical species. Mutations in this gene have been associated with tardive dyskinesia (TD), an increased risk of hematotoxicity after exposure to benzene, and susceptibility to various forms of cancer. Altered expression of this protein has been seen in many tumors and is also associated with Alzheimer’s disease (AD). Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
myosin light chain 4 4635 MYL4 ENSG00000198336 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. NA
carboxypeptidase Q 10404 CPQ ENSG00000104324 This gene encodes a metallopeptidase that belongs to the peptidase M28 family. The encoded protein may catalyze the cleavage of dipeptides with unsubstituted terminals into amino acids. NA
small integral membrane protein 1 (Vel blood group) 388588 SMIM1 ENSG00000235169 This gene encodes a small, conserved protein that participates in red blood cell formation. The encoded protein is localized to the cell membrane and is the antigen for the Vel blood group. Alternative splicing results in different transcript variants that encode the same protein. NA
microRNA 3917 100500808 MIR3917 ENSG00000264021 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. NA
membrane associated ring-CH-type finger 3 115123 MARCH3 ENSG00000173926 This gene encodes a member of the membrane-associated RING-CH (MARCH) family. The encoded protein is an E3 ubiquitin-protein ligase that may be involved in regulation of the endosomal transport pathway. NA
dihydrofolate reductase pseudogene 1 ENSG00000188985 DHFRP1 ENSG00000188985 NA NA
NA NA NA ENSG00000237485 NA TRUE
phosphatidylinositol glycan anchor biosynthesis class Z 80235 PIGZ ENSG00000119227 The glycosylphosphatidylinositol (GPI) anchor is a glycolipid found on many blood cells that serves to anchor proteins to the cell surface. This gene encodes a protein that is localized to the endoplasmic reticulum, and is involved in GPI anchor biosynthesis. As shown for the yeast homolog, which is a member of a family of dolichol-phosphate-mannose (Dol-P-Man)-dependent mannosyltransferases, this protein can also add a side-branching fourth mannose to GPI precursors during the assembly of GPI anchors. NA
centromere protein M 79019 CENPM ENSG00000100162 The protein encoded by this gene is an inner protein of the kinetochore, the multi-protein complex that binds spindle microtubules to regulate chromosome segregation during cell division. It belongs to the constitutive centromere-associated network protein group, whose members interact with outer kinetochore proteins and help to maintain centromere identity at each cell division cycle. The protein is structurally related to GTPases but cannot bind guanosine triphosphate. A point mutation that affects interaction with another constitutive centromere-associated network protein, CENP-I, impairs kinetochore assembly and chromosome alignment, suggesting that it is required for kinetochore formation. Alternative splicing results in multiple transcript variants. NA
NA ENSG00000256663 RP11-424C20.2 ENSG00000256663 NA NA
zinc finger protein 876, pseudogene 642280 ZNF876P ENSG00000198155 NA NA
troponin I3, cardiac type 7137 TNNI3 ENSG00000129991 Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name symbol summary notfound
ENSG00000245694 ENSG00000245694 colorectal neoplasia differentially expressed (non-protein coding) CRNDE NA NA
ENSG00000140932 146225 CKLF like MARVEL transmembrane domain containing 2 CMTM2 This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. NA
ENSG00000103485 23475 quinolinate phosphoribosyltransferase QPRT This gene encodes a key enzyme in catabolism of quinolinate, an intermediate in the tryptophan-nicotinamide adenine dinucleotide pathway. Quinolinate acts as a most potent endogenous exitotoxin to neurons. Elevation of quinolinate levels in the brain has been linked to the pathogenesis of neurodegenerative disorders such as epilepsy, Alzheimer’s disease, and Huntington’s disease. Alternative splicing results in multiple transcript variants. NA
ENSG00000111261 54682 MANSC domain containing 1 MANSC1 NA NA
ENSG00000152463 55301 oleoyl-ACP hydrolase OLAH NA NA
ENSG00000114200 590 butyrylcholinesterase BCHE Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. NA
ENSG00000059915 5662 pleckstrin and Sec7 domain containing PSD This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. NA
ENSG00000158089 79623 polypeptide N-acetylgalactosaminyltransferase 14 GALNT14 This gene encodes a Golgi protein which is a member of the polypeptide N-acetylgalactosaminyltransferase (ppGalNAc-Ts) protein family. These enzymes catalyze the transfer of N-acetyl-D-galactosamine (GalNAc) to the hydroxyl groups on serines and threonines in target peptides. The encoded protein has been shown to transfer GalNAc to large proteins like mucins. Multiple transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000130208 341 apolipoprotein C1 APOC1 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. NA
ENSG00000054598 2296 forkhead box C1 FOXC1 This gene belongs to the forkhead family of transcription factors which is characterized by a distinct DNA-binding forkhead domain. The specific function of this gene has not yet been determined; however, it has been shown to play a role in the regulation of embryonic and ocular development. Mutations in this gene cause various glaucoma phenotypes including primary congenital glaucoma, autosomal dominant iridogoniodysgenesis anomaly, and Axenfeld-Rieger anomaly. NA
ENSG00000114993 6242 rhotekin RTKN This gene encodes a scaffold protein that interacts with GTP-bound Rho proteins. Binding of this protein inhibits the GTPase activity of Rho proteins. This protein may interfere with the conversion of active, GTP-bound Rho to the inactive GDP-bound form by RhoGAP. Rho proteins regulate many important cellular processes, including cytokinesis, transcription, smooth muscle contraction, cell growth and transformation. Dysregulation of the Rho signal transduction pathway has been implicated in many forms of cancer. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ENSG00000134243 6272 sortilin 1 SORT1 This gene encodes a member of the VPS10-related sortilin family of proteins. The encoded preproprotein is proteolytically processed by furin to generate the mature receptor. This receptor plays a role in the trafficking of different proteins to either the cell surface, or subcellular compartments such as lysosomes and endosomes. Expression levels of this gene may influence the risk of myocardial infarction in human patients. Alternative splicing results in multiple transcript variants. NA
ENSG00000164221 153733 coiled-coil domain containing 112 CCDC112 NA NA
ENSG00000169282 7881 potassium voltage-gated channel subfamily A member regulatory beta subunit 1 KCNAB1 Potassium channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. This member includes distinct isoforms which are encoded by alternatively spliced transcript variants of this gene. Some of these isoforms are beta subunits, which form heteromultimeric complexes with alpha subunits and modulate the activity of the pore-forming alpha subunits. NA
ENSG00000133321 5920 retinoic acid receptor responder 3 RARRES3 Retinoids exert biologic effects such as potent growth inhibitory and cell differentiation activities and are used in the treatment of hyperproliferative dermatological diseases. These effects are mediated by specific nuclear receptor proteins that are members of the steroid and thyroid hormone receptor superfamily of transcriptional regulators. RARRES1, RARRES2, and RARRES3 are genes whose expression is upregulated by the synthetic retinoid tazarotene. RARRES3 is thought act as a tumor suppressor or growth regulator. NA
ENSG00000228477 ENSG00000228477 NA RP3-342P20.2 NA NA
ENSG00000189269 51233 aspartate rich 1 DRICH1 NA NA
ENSG00000162545 55450 calcium/calmodulin dependent protein kinase II inhibitor 1 CAMK2N1 NA NA
ENSG00000121316 79887 phospholipase B domain containing 1 PLBD1 NA NA
ENSG00000180672 NA NA NA NA TRUE
ENSG00000268565 ENSG00000268565 NA AC005339.2 NA NA
ENSG00000128342 3976 leukemia inhibitory factor LIF The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
ENSG00000118257 8828 neuropilin 2 NRP2 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. NA
ENSG00000111644 84519 acrosin binding protein ACRBP The protein encoded by this gene is similar to proacrosin binding protein sp32 precursor found in mouse, guinea pig, and pig. This protein is located in the sperm acrosome and is thought to function as a binding protein to proacrosin for packaging and condensation of the acrosin zymogen in the acrosomal matrix. This protein is a member of the cancer/testis family of antigens and it is found to be immunogenic. In normal tissues, this mRNA is expressed only in testis, whereas it is detected in a range of different tumor types such as bladder, breast, lung, liver, and colon. NA
ENSG00000257764 ENSG00000257764 NA RP11-1143G9.4 NA NA
ENSG00000090382 4069 lysozyme LYZ This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. NA
ENSG00000177374 3090 hypermethylated in cancer 1 HIC1 This gene functions as a growth regulatory and tumor repressor gene. Hypermethylation or deletion of the region of this gene have been associated with tumors and the contiguous-gene syndrome, Miller-Dieker syndrome. Alternative splicing of this gene results in multiple transcript variants. NA
ENSG00000135069 29968 phosphoserine aminotransferase 1 PSAT1 This gene encodes a member of the class-V pyridoxal-phosphate-dependent aminotransferase family. The encoded protein is a phosphoserine aminotransferase and decreased expression may be associated with schizophrenia. Mutations in this gene are also associated with phosphoserine aminotransferase deficiency. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene have been defined on chromosomes 1, 3, and 8. NA
ENSG00000173193 54625 poly(ADP-ribose) polymerase family member 14 PARP14 Poly(ADP-ribosyl)ation is an immediate DNA damage-dependent posttranslational modification of histones and other nuclear proteins that contributes to the survival of injured proliferating cells. PARP14 belongs to the superfamily of enzymes that perform this modification (Ame et al., 2004 [PubMed 15273990]). NA
ENSG00000196782 55534 mastermind like transcriptional coactivator 3 MAML3 NA NA
ENSG00000049540 2006 elastin ELN This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000137285 347733 tubulin beta 2B class IIb TUBB2B The protein encoded by this gene is a beta isoform of tubulin, which binds GTP and is a major component of microtubules. This gene is highly similar to TUBB2A and TUBB2C. Defects in this gene are a cause of asymmetric polymicrogyria. NA
ENSG00000101194 63910 solute carrier family 17 member 9 SLC17A9 This gene encodes a member of a family of transmembrane proteins that are involved in the transport of small molecules. The encoded protein participates in the vesicular uptake, storage, and secretion of adenoside triphosphate (ATP) and other nucleotides. A mutation in this gene was found in individuals with autosomal dominant disseminated superficial actinic porokeratosis-8. Alternative splicing results in multiple transcript variants. NA
ENSG00000184524 51286 cell cycle exit and neuronal differentiation 1 CEND1 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. NA
ENSG00000231856 ENSG00000231856 NA RP11-327P2.5 NA NA
ENSG00000185361 126282 TNF alpha induced protein 8 like 1 TNFAIP8L1 NA NA
ENSG00000184232 220323 out at first homolog OAF NA NA
ENSG00000104435 11075 stathmin 2 STMN2 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. NA
ENSG00000126500 23769 fibronectin leucine rich transmembrane protein 1 FLRT1 This gene encodes a member of the fibronectin leucine rich transmembrane protein (FLRT) family. The family members may function in cell adhesion and/or receptor signalling. Their protein structures resemble small leucine-rich proteoglycans found in the extracellular matrix. The encoded protein shares sequence similarity with two other family members, FLRT2 and FLRT3. This gene is expressed in kidney and brain. NA
ENSG00000012124 933 CD22 molecule CD22 NA NA
ENSG00000166900 6809 syntaxin 3 STX3 The gene is a member of the syntaxin family. The encoded protein is targeted to the apical membrane of epithelial cells where it forms clusters and is important in establishing and maintaining polarity necessary for protein trafficking involving vesicle fusion and exocytosis. Alternative splicing results in multiple transcript variants. NA
ENSG00000246705 55766 H2A histone family member J H2AFJ Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene is located on chromosome 12 and encodes a replication-independent histone that is a variant H2A histone. The protein is divergent at the C-terminus compared to the consensus H2A histone family member. This gene also encodes an antimicrobial peptide with antibacterial and antifungal activity. NA
ENSG00000099958 91319 derlin 3 DERL3 The protein encoded by this gene belongs to the derlin family, and resides in the endoplasmic reticulum (ER). Proteins that are unfolded or misfolded in the ER must be refolded or degraded to maintain the homeostasis of the ER. This protein appears to be involved in the degradation of misfolded glycoproteins in the ER. Several alternatively spliced transcript variants encoding different isoforms have been identified for this gene. NA
ENSG00000232810 7124 tumor necrosis factor TNF This gene encodes a multifunctional proinflammatory cytokine that belongs to the tumor necrosis factor (TNF) superfamily. This cytokine is mainly secreted by macrophages. It can bind to, and thus functions through its receptors TNFRSF1A/TNFR1 and TNFRSF1B/TNFBR. This cytokine is involved in the regulation of a wide spectrum of biological processes including cell proliferation, differentiation, apoptosis, lipid metabolism, and coagulation. This cytokine has been implicated in a variety of diseases, including autoimmune diseases, insulin resistance, and cancer. Knockout studies in mice also suggested the neuroprotective function of this cytokine. NA
ENSG00000090339 3383 intercellular adhesion molecule 1 ICAM1 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. NA
ENSG00000135083 79616 cyclin J like CCNJL NA NA
ENSG00000168389 84879 major facilitator superfamily domain containing 2A MFSD2A NA NA
ENSG00000172426 221421 radial spoke head 9 homolog RSPH9 This gene encodes a protein thought to be a component of the radial spoke head in motile cilia and flagella. Mutations in this gene are associated with primary ciliary dyskinesia 12. Alternative splicing results in multiple transcript variants. NA
ENSG00000272463 ENSG00000272463 NA RP11-532F6.3 NA NA
ENSG00000125510 4987 opioid related nociceptin receptor 1 OPRL1 The protein encoded by this gene is a member of the 7 transmembrane-spanning G protein-coupled receptor family, and functions as a receptor for the endogenous, opioid-related neuropeptide, nociceptin/orphanin FQ. This receptor-ligand system modulates a variety of biological functions and neurobehavior, including stress responses and anxiety behavior, learning and memory, locomotor activity, and inflammatory and immune responses. A promoter region between this gene and the 5’-adjacent RGS19 (regulator of G-protein signaling 19) gene on the opposite strand functions bi-directionally as a core-promoter for both genes, suggesting co-operative transcriptional regulation of these two functionally related genes. Alternatively spliced transcript variants have been described for this gene. A recent study provided evidence for translational readthrough in this gene and expression of an additional C-terminally extended isoform via the use of an alternative in-frame translation termination codon. NA
ENSG00000235151 ENSG00000235151 NA AC114730.2 NA NA
ENSG00000151743 196394 antagonist of mitotic exit network 1 homolog AMN1 NA NA
ENSG00000241360 57026 pyridoxal phosphatase PDXP Pyridoxal 5-prime-phosphate (PLP) is the active form of vitamin B6 that acts as a coenzyme in maintaining biochemical homeostasis. The preferred degradation route from PLP to 4-pyridoxic acid involves the dephosphorylation of PLP by PDXP (Jang et al., 2003 [PubMed 14522954]). NA
ENSG00000109472 1363 carboxypeptidase E CPE This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. NA
ENSG00000224846 ENSG00000224846 NA RP1-90J20.8 NA NA
ENSG00000211683 ENSG00000211683 NA KB-1572G7.3 NA NA
ENSG00000160932 4061 lymphocyte antigen 6 complex, locus E LY6E NA NA
ENSG00000135094 10993 serine dehydratase SDS This gene encodes one of three enzymes that are involved in metabolizing serine and glycine. L-serine dehydratase converts L-serine to pyruvate and ammonia and requires pyridoxal phosphate as a cofactor. The encoded protein can also metabolize threonine to NH4+ and 2-ketobutyrate. The encoded protein is found predominantly in the liver. NA
ENSG00000267387 ENSG00000267387 NA CTD-2240E14.4 NA NA
ENSG00000111110 57460 protein phosphatase, Mg2+/Mn2+ dependent 1H PPM1H NA NA
ENSG00000237940 102723927 uncharacterized LOC102723927 LOC102723927 NA NA
ENSG00000271020 ENSG00000271020 NA RP11-10C24.1 NA NA
ENSG00000112667 10591 2’-deoxynucleoside 5’-phosphate N-hydrolase 1 DNPH1 This gene was identified on the basis of its stimulation by c-Myc protein. The latter is a transcription factor that participates in the regulation of cell proliferation, differentiation, and apoptosis. The exact function of this gene is not known but studies in rat suggest a role in cellular proliferation and c-Myc-mediated transformation. Two alternative transcripts encoding different proteins have been described. NA
ENSG00000226833 ENSG00000226833 NA AC097724.3 NA NA
ENSG00000256341 ENSG00000256341 NA RP11-21A7A.3 NA NA
ENSG00000117228 2633 guanylate binding protein 1 GBP1 Guanylate binding protein expression is induced by interferon. Guanylate binding proteins are characterized by their ability to specifically bind guanine nucleotides (GMP, GDP, and GTP) and are distinguished from the GTP-binding proteins by the presence of 2 binding motifs rather than 3. NA
ENSG00000103335 9780 piezo type mechanosensitive ion channel component 1 PIEZO1 The protein encoded by this gene is a mechanically-activated ion channel that links mechanical forces to biological signals. The encoded protein contains 36 transmembrane domains and functions as a homotetramer. Defects in this gene have been associated with dehydrated hereditary stomatocytosis. NA
ENSG00000065615 51167 cytochrome b5 reductase 4 CYB5R4 NCB5OR is a flavohemoprotein that contains functional domains found in both cytochrome b5 (CYB5A; MIM 613218) and CYB5 reductase (CYB5R3; MIM 613213) (Zhu et al., 1999 [PubMed 10611283]). NA
ENSG00000163536 5274 serpin family I member 1 SERPINI1 This gene encodes a member of the serpin superfamily of serine proteinase inhibitors. The protein is primarily secreted by axons in the brain, and preferentially reacts with and inhibits tissue-type plasminogen activator. It is thought to play a role in the regulation of axonal growth and the development of synaptic plasticity. Mutations in this gene result in familial encephalopathy with neuroserpin inclusion bodies (FENIB), which is a dominantly inherited form of familial encephalopathy and epilepsy characterized by the accumulation of mutant neuroserpin polymers. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
ENSG00000175898 NA NA NA NA TRUE
ENSG00000109586 51809 polypeptide N-acetylgalactosaminyltransferase 7 GALNT7 This gene encodes GalNAc transferase 7, a member of the GalNAc-transferase family. The enzyme encoded by this gene controls the initiation step of mucin-type O-linked protein glycosylation and transfer of N-acetylgalactosamine to serine and threonine amino acid residues. This enzyme is a type II transmembrane protein and shares common sequence motifs with other family members. Unlike other family members, this enzyme shows exclusive specificity for partially GalNAc-glycosylated acceptor substrates and shows no activity with non-glycosylated peptides. This protein may function as a follow-up enzyme in the initiation step of O-glycosylation. NA
ENSG00000119681 4053 latent transforming growth factor beta binding protein 2 LTBP2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. NA
ENSG00000162591 1953 multiple EGF like domains 6 MEGF6 NA NA
ENSG00000232222 NA NA NA NA TRUE
ENSG00000248774 ENSG00000248774 NA RP11-798M19.3 NA NA
ENSG00000264924 ENSG00000264924 NA RP11-799B12.2 NA NA
ENSG00000120594 84898 plexin domain containing 2 PLXDC2 NA NA
ENSG00000136156 9445 integral membrane protein 2B ITM2B Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. NA
ENSG00000224459 ENSG00000224459 NA RP11-169K16.4 NA NA
ENSG00000115271 25801 grancalcin GCA This gene product, grancalcin, is a calcium-binding protein abundant in neutrophils and macrophages. It belongs to the penta-EF-hand subfamily of proteins which includes sorcin, calpain, and ALG-2. Grancalcin localization is dependent upon calcium and magnesium. In the absence of divalent cation, grancalcin localizes to the cytosolic fraction; with magnesium alone, it partitions with the granule fraction; and in the presence of magnesium and calcium, it associates with both the granule and membrane fractions, suggesting a role for grancalcin in granule-membrane fusion and degranulation. NA
ENSG00000140545 4240 milk fat globule-EGF factor 8 protein MFGE8 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. NA
ENSG00000122877 1959 early growth response 2 EGR2 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. NA
ENSG00000249685 ENSG00000249685 NA RP11-360F5.3 NA NA
ENSG00000136999 4856 nephroblastoma overexpressed NOV The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. NA
ENSG00000160293 7410 vav guanine nucleotide exchange factor 2 VAV2 VAV2 is the second member of the VAV guanine nucleotide exchange factor family of oncogenes. Unlike VAV1, which is expressed exclusively in hematopoietic cells, VAV2 transcripts were found in most tissues. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000267601 ENSG00000267601 NA RP11-323N12.5 NA NA
ENSG00000145506 85409 naked cuticle homolog 2 NKD2 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
ENSG00000261771 100533483 DYX1C1-CCPG1 readthrough (NMD candidate) DYX1C1-CCPG1 This locus represents naturally occurring read-through transcription between the neighboring dyslexia susceptibility 1 candidate 1 (DYX1C1) and cell cycle progression 1 (CCPG1) genes on chromosome 15. The read-through transcript is a candidate for nonsense-mediated mRNA decay (NMD), and is thus unlikely to produce a protein product. NA
ENSG00000250900 ENSG00000250900 NA CTC-338M12.6 NA NA
ENSG00000029534 286 ankyrin 1 ANK1 Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Multiple isoforms of ankyrin with different affinities for various target proteins are expressed in a tissue-specific, developmentally regulated manner. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. Ankyrin 1, the prototype of this family, was first discovered in the erythrocytes, but since has also been found in brain and muscles. Mutations in erythrocytic ankyrin 1 have been associated in approximately half of all patients with hereditary spherocytosis. Complex patterns of alternative splicing in the regulatory domain, giving rise to different isoforms of ankyrin 1 have been described. Truncated muscle-specific isoforms of ankyrin 1 resulting from usage of an alternate promoter have also been identified. NA
ENSG00000115339 2591 polypeptide N-acetylgalactosaminyltransferase 3 GALNT3 This gene encodes UDP-GalNAc transferase 3, a member of the GalNAc-transferases family. This family transfers an N-acetyl galactosamine to the hydroxyl group of a serine or threonine residue in the first step of O-linked oligosaccharide biosynthesis. Individual GalNAc-transferases have distinct activities and initiation of O-glycosylation is regulated by a repertoire of GalNAc-transferases. The protein encoded by this gene is highly homologous to other family members, however the enzymes have different substrate specificities. NA
ENSG00000106785 9830 tripartite motif containing 14 TRIM14 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic bodies and its function has not been determined. Alternative splicing results in multiple transcript variants. NA
ENSG00000232415 ENSG00000232415 NA CTB-51J22.1 NA NA
ENSG00000133121 90627 StAR related lipid transfer domain containing 13 STARD13 This gene encodes a protein which contains an N-terminal sterile alpha motif (SAM) for protein-protein interactions, followed by an ATP/GTP-binding motif, a GTPase-activating protein (GAP) domain, and a C-terminal STAR-related lipid transfer (START) domain. It may be involved in regulation of cytoskeletal reorganization, cell proliferation, and cell motility, and acts as a tumor suppressor in hepatoma cells. The gene is located in a region of chromosome 13 that is associated with loss of heterozygosity in hepatocellular carcinomas. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
ENSG00000271643 ENSG00000271643 NA RP11-10C24.3 NA NA
ENSG00000124570 5269 serpin family B member 6 SERPINB6 The protein encoded by this gene is a member of the serpin (serine proteinase inhibitor) superfamily, and ovalbumin(ov)-serpin subfamily. It was originally discovered as a placental thrombin inhibitor. The mouse homolog was found to be expressed in the hair cells of the inner ear. Mutations in this gene are associated with nonsyndromic progressive hearing loss, suggesting that this serpin plays an important role in the inner ear in the protection against leakage of lysosomal content during stress, and that loss of this protection results in cell death and sensorineural hearing loss. Alternatively spliced transcript variants have been found for this gene. NA
ENSG00000163191 6282 S100 calcium binding protein A11 S100A11 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. NA
ENSG00000260121 ENSG00000260121 NA RP5-1142A6.9 NA NA
ENSG00000198598 4326 matrix metallopeptidase 17 MMP17 This gene encodes a member of the peptidase M10 family and membrane-type subfamily of matrix metalloproteinases (MMPs). Proteins in this family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Members of this subfamily contain a transmembrane domain suggesting that these proteins are expressed at the cell surface rather than secreted. The encoded preproprotein is proteolytically processed to generate the mature protease. This protein is unique among the membrane-type matrix metalloproteinases in that it is anchored to the cell membrane via a glycosylphosphatidylinositol (GPI) anchor. Elevated expression of the encoded protein has been observed in osteoarthritis and multiple human cancers. NA
ENSG00000140443 3480 insulin like growth factor 1 receptor IGF1R This receptor binds insulin-like growth factor with a high affinity. It has tyrosine kinase activity. The insulin-like growth factor I receptor plays a critical role in transformation events. Cleavage of the precursor generates alpha and beta subunits. It is highly overexpressed in most malignant tissues where it functions as an anti-apoptotic agent by enhancing cell survival. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
LGALS4 3960 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. ENSG00000171747 galectin 4 NA
NA NA NA ENSG00000165862 NA TRUE
SLC39A5 283375 The protein encoded by this gene belongs to the ZIP family of zinc transporters that transport zinc into cells from outside, and play a crucial role in controlling intracellular zinc levels. Zinc is an essential cofactor for many enzymes and proteins involved in gene transcription, growth, development and differentiation. Mutations in this gene have been associated with autosomal dominant high myopia (MYP24). Alternatively spliced transcript variants have been found for this gene. ENSG00000139540 solute carrier family 39 member 5 NA
REG1A 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000115386 regenerating family member 1 alpha NA
NR4A3 8013 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000119508 nuclear receptor subfamily 4 group A member 3 NA
PRSS3 5646 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. ENSG00000010438 protease, serine 3 NA
PGA3 643834 This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. ENSG00000229859 pepsinogen 3, group I (pepsinogen A) NA
RGS16 6004 The protein encoded by this gene belongs to the ‘regulator of G protein signaling’ family. It inhibits signal transduction by increasing the GTPase activity of G protein alpha subunits. It also may play a role in regulating the kinetics of signaling in the phototransduction cascade. ENSG00000143333 regulator of G-protein signaling 16 NA
MSX1 4487 This gene encodes a member of the muscle segment homeobox gene family. The encoded protein functions as a transcriptional repressor during embryogenesis through interactions with components of the core transcription complex and other homeoproteins. It may also have roles in limb-pattern formation, craniofacial development, particularly odontogenesis, and tumor growth inhibition. Mutations in this gene, which was once known as homeobox 7, have been associated with nonsyndromic cleft lip with or without cleft palate 5, Witkop syndrome, Wolf-Hirschom syndrome, and autosomoal dominant hypodontia. ENSG00000163132 msh homeobox 1 NA
REG1B 5968 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000172023 regenerating family member 1 beta NA
SPINK1 6690 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. ENSG00000164266 serine peptidase inhibitor, Kazal type 1 NA
REG3A 5068 This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. ENSG00000172016 regenerating family member 3 alpha NA
STX11 8676 This gene encodes a member of the syntaxin family. Syntaxins have been implicated in the targeting and fusion of intracellular transport vesicles. This family member may regulate protein transport among late endosomes and the trans-Golgi network. Mutations in this gene have been associated with familial hemophagocytic lymphohistiocytosis. ENSG00000135604 syntaxin 11 NA
CCDC151 115948 This gene encodes a protein containing coiled-coil domains. The encoded protein functions in outer dynein arm assembly and is required for motile cilia function. Mutations in this gene result in primary ciliary dyskinesia. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000198003 coiled-coil domain containing 151 NA
GPR84 53831 NA ENSG00000139572 G protein-coupled receptor 84 NA
TMED6 146456 NA ENSG00000157315 transmembrane p24 trafficking protein 6 NA
CD200 4345 This gene encodes a type I membrane glycoprotein containing two extracellular immunoglobulin domains, a transmembrane and a cytoplasmic domain. This gene is expressed by various cell types, including B cells, a subset of T cells, thymocytes, endothelial cells, and neurons. The encoded protein plays an important role in immunosuppression and regulation of anti-tumor activity. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000091972 CD200 molecule NA
AKR7L ENSG00000211454 NA ENSG00000211454 aldo-keto reductase family 7-like (gene/pseudogene) NA
SLC16A6 9120 NA ENSG00000108932 solute carrier family 16 member 6 NA
AKR7A3 22977 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. ENSG00000162482 aldo-keto reductase family 7 member A3 NA
C1QL1 10882 NA ENSG00000131094 complement component 1, q subcomponent-like 1 NA
LRP8 7804 This gene encodes a member of the low density lipoprotein receptor (LDLR) family. Low density lipoprotein receptors are cell surface proteins that play roles in both signal transduction and receptor-mediated endocytosis of specific ligands for lysosomal degradation. The encoded protein plays a critical role in the migration of neurons during development by mediating Reelin signaling, and also functions as a receptor for the cholesterol transport protein apolipoprotein E. Expression of this gene may be a marker for major depressive disorder. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000157193 LDL receptor related protein 8 NA
CYP3A5 1577 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. ENSG00000106258 cytochrome P450 family 3 subfamily A member 5 NA
GKN1 56287 The protein encoded by this gene is found to be down-regulated in human gastric cancer tissue as compared to normal gastric mucosa. ENSG00000169605 gastrokine 1 NA
FMO5 2330 Metabolic N-oxidation of the diet-derived amino-trimethylamine (TMA) is mediated by flavin-containing monooxygenase and is subject to an inherited FMO3 polymorphism in man resulting in a small subpopulation with reduced TMA N-oxidation capacity resulting in fish odor syndrome Trimethylaminuria. Three forms of the enzyme, FMO1 found in fetal liver, FMO2 found in adult liver, and FMO3 are encoded by genes clustered in the 1q23-q25 region. Flavin-containing monooxygenases are NADPH-dependent flavoenzymes that catalyzes the oxidation of soft nucleophilic heteroatom centers in drugs, pesticides, and xenobiotics. Alternative splicing results in multiple transcript variants. ENSG00000131781 flavin containing monooxygenase 5 NA
ATP1B3-AS1 ENSG00000244124 NA ENSG00000244124 ATP1B3 antisense RNA 1 NA
FBXW4P1 26226 NA ENSG00000230701 F-box and WD repeat domain containing 4 pseudogene 1 NA
GP2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. ENSG00000169347 glycoprotein 2 NA
LIPF 8513 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000182333 lipase F, gastric type NA
SH3D21 79729 NA ENSG00000214193 SH3 domain containing 21 NA
OLFM4 10562 This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. ENSG00000102837 olfactomedin 4 NA
TMEM217 221468 NA ENSG00000172738 transmembrane protein 217 NA
CBARP 255057 NA ENSG00000099625 CACN beta subunit associated regulatory protein NA
RASD1 51655 This gene encodes a member of the Ras superfamily of small GTPases and is induced by dexamethasone. The encoded protein is an activator of G-protein signaling and acts as a direct nucleotide exchange factor for Gi-Go proteins. This protein interacts with the neuronal nitric oxide adaptor protein CAPON, and a nuclear adaptor protein FE65, which interacts with the Alzheimer’s disease amyloid precursor protein. This gene may play a role in dexamethasone-induced alterations in cell morphology, growth and cell-extracellular matrix interactions. Epigenetic inactivation of this gene is closely correlated with resistance to dexamethasone in multiple myeloma cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000108551 ras related dexamethasone induced 1 NA
PANX2 56666 The protein encoded by this gene belongs to the innexin family. Innexin family members are the structural components of gap junctions. This protein and pannexin 1 are abundantly expressed in central nervous system (CNS) and are coexpressed in various neuronal populations. Studies in Xenopus oocytes suggest that this protein alone and in combination with pannexin 1 may form cell type-specific gap junctions with distinct properties. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000073150 pannexin 2 NA
HHATL 57467 NA ENSG00000010282 hedgehog acyltransferase-like NA
WDR66 144406 This protein encoded by this gene belongs to the WD repeat-containing family of proteins, which function in the formation of protein-protein complexes in a variety of biological pathways. This family member appears to function in the determination of mean platelet volume (MPV), and polymorphisms in this gene have been associated with variance in MPV. Alternative splicing of this gene results in multiple transcript variants. ENSG00000158023 WD repeat domain 66 NA
RP11-337C18.8 ENSG00000237188 NA ENSG00000237188 NA NA
ESAM 90952 NA ENSG00000149564 endothelial cell adhesion molecule NA
NR4A1 3164 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000123358 nuclear receptor subfamily 4 group A member 1 NA
FOSL1 8061 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000175592 FOS like 1, AP-1 transcription factor subunit NA
SELENBP1 8991 This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000143416 selenium binding protein 1 NA
PBLD 64081 NA ENSG00000108187 phenazine biosynthesis like protein domain containing NA
ZNF331 55422 This gene encodes a zinc finger protein containing a KRAB (Kruppel-associated box) domain found in transcriptional repressors. This gene may be methylated and silenced in cancer cells. This gene is located within a differentially methylated region (DMR) and shows allele-specific expression in placenta. Alternative splicing and the use of alternative promoters results in multiple transcript variants encoding the same protein. ENSG00000130844 zinc finger protein 331 NA
LIPG 9388 The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. ENSG00000101670 lipase G, endothelial type NA
HNRNPA1P59 ENSG00000230280 NA ENSG00000230280 heterogeneous nuclear ribonucleoprotein A1 pseudogene 59 NA
SEMA4A 64218 This gene encodes a member of the semaphorin family of soluble and transmembrane proteins. Semaphorins are involved in numerous functions, including axon guidance, morphogenesis, carcinogenesis, and immunomodulation. The encoded protein is a single-pass type I membrane protein containing an immunoglobulin-like C2-type domain, a PSI domain and a sema domain. It inhibits axonal extension by providing local signals to specify territories inaccessible for growing axons. It is an activator of T-cell-mediated immunity and suppresses vascular endothelial growth factor (VEGF)-mediated endothelial cell migration and proliferation in vitro and angiogenesis in vivo. Mutations in this gene are associated with retinal degenerative diseases including retinitis pigmentosa type 35 (RP35) and cone-rod dystrophy type 10 (CORD10). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000196189 semaphorin 4A NA
SGK2 10110 This gene encodes a serine/threonine protein kinase. Although this gene product is similar to serum- and glucocorticoid-induced protein kinase (SGK), this gene is not induced by serum or glucocorticoids. This gene is induced in response to signals that activate phosphatidylinositol 3-kinase, which is also true for SGK. Alternative splicing results in multiple transcript variants. ENSG00000101049 SGK2, serine/threonine kinase 2 NA
NA NA NA ENSG00000271769 NA TRUE
NA NA NA ENSG00000250606 NA TRUE
FGF11 2256 The protein encoded by this gene is a member of the fibroblast growth factor (FGF) family. FGF family members possess broad mitogenic and cell survival activities, and are involved in a variety of biological processes, including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth and invasion. The function of this gene has not yet been determined. The expression pattern of the mouse homolog implies a role in nervous system development. Alternative splicing results in multiple transcript variants. ENSG00000161958 fibroblast growth factor 11 NA
HOPX 84525 The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. ENSG00000171476 HOP homeobox NA
LOC101929523 101929523 NA ENSG00000226445 uncharacterized LOC101929523 NA
PHYHIP 9796 NA ENSG00000168490 phytanoyl-CoA 2-hydroxylase interacting protein NA
CPA2 1358 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. ENSG00000158516 carboxypeptidase A2 NA
HILPDA 29923 NA ENSG00000135245 hypoxia inducible lipid droplet associated NA
ATG9B 285973 This gene functions in the regulation of autophagy, a lysosomal degradation pathway. This gene also functions as an antisense transcript in the posttranscriptional regulation of the endothelial nitric oxide synthase 3 gene, which has 3’ overlap with this gene on the opposite strand. Mutations in this gene and disruption of the autophagy process have been associated with multiple cancers. Alternative splicing results in multiple transcript variants. ENSG00000181652 autophagy related 9B NA
LOC284648 284648 NA ENSG00000261504 uncharacterized LOC284648 NA
TOX 9760 The protein encoded by this gene contains a HMG box DNA binding domain. HMG boxes are found in many eukaryotic proteins involved in chromatin assembly, transcription and replication. This protein may function to regulate T-cell development. ENSG00000198846 thymocyte selection associated high mobility group box NA
CASP6 839 This gene encodes a member of the cysteine-aspartic acid protease (caspase) family of enzymes. Sequential activation of caspases plays a central role in the execution-phase of cell apoptosis. Caspases exist as inactive proenzymes which undergo proteolytic processing at conserved aspartic acid residues to produce two subunits, large and small, that dimerize to form the active enzyme. This protein is processed by caspases 7, 8 and 10, and is thought to function as a downstream enzyme in the caspase activation cascade. Alternative splicing of this gene results in multiple transcript variants that encode different isoforms. ENSG00000138794 caspase 6 NA
BHLHE40 8553 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. ENSG00000134107 basic helix-loop-helix family member e40 NA
P2RX1 5023 The protein encoded by this gene belongs to the P2X family of G-protein-coupled receptors. These proteins can form homo-and heterotimers and function as ATP-gated ion channels and mediate rapid and selective permeability to cations. This protein is primarily localized to smooth muscle where binds ATP and mediates synaptic transmission between neurons and from neurons to smooth muscle and may being responsible for sympathetic vasoconstriction in small arteries, arterioles and vas deferens. Mouse studies suggest that this receptor is essential for normal male reproductive function. This protein may also be involved in promoting apoptosis. ENSG00000108405 purinergic receptor P2X 1 NA
CALML4 91860 NA ENSG00000129007 calmodulin like 4 NA
SEMA3B 7869 The protein encoded by this gene belongs to the class-3 semaphorin/collapsin family, whose members function in growth cone guidance during neuronal development. This family member inhibits axonal extension and has been shown to act as a tumor suppressor by inducing apoptosis. Alternative splicing of this gene results in multiple transcript variants. ENSG00000012171 semaphorin 3B NA
THBD 7056 The protein encoded by this intronless gene is an endothelial-specific type I membrane receptor that binds thrombin. This binding results in the activation of protein C, which degrades clotting factors Va and VIIIa and reduces the amount of thrombin generated. Mutations in this gene are a cause of thromboembolic disease, also known as inherited thrombophilia. ENSG00000178726 thrombomodulin NA
ABLIM2 84448 NA ENSG00000163995 actin binding LIM protein family member 2 NA
RELL2 285613 NA ENSG00000164620 RELT like 2 NA
TG 7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. ENSG00000042832 thyroglobulin NA
ZNF69 7620 NA ENSG00000198429 zinc finger protein 69 NA
LGALS7B 653499 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. ENSG00000178934 galectin 7B NA
CTD-2651B20.1 ENSG00000259539 NA ENSG00000259539 NA NA
RP11-56M3.1 ENSG00000234043 NA ENSG00000234043 NA NA
NAP1L5 266812 This gene encodes a protein that shares sequence similarity to nucleosome assembly factors, but may be localized to the cytoplasm rather than the nucleus. Expression of this gene is downregulated in hepatocellular carcinomas. This gene is located within a differentially methylated region (DMR) and is imprinted and paternally expressed. There is a related pseudogene on chromosome 4. ENSG00000177432 nucleosome assembly protein 1 like 5 NA
STMN2 11075 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. ENSG00000104435 stathmin 2 NA
PGF 5228 This gene encodes a growth factor found in placenta which is homologous to vascular endothelial growth factor. Alternatively spliced transcripts encoding different isoforms have been found for this gene. ENSG00000119630 placental growth factor NA
ALDOB 229 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. ENSG00000136872 aldolase, fructose-bisphosphate B NA
PRSS1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1 NA
CTD-2006C1.12 ENSG00000267274 NA ENSG00000267274 NA NA
CHSY1 22856 This gene encodes a member of the chondroitin N-acetylgalactosaminyltransferase family. These enzymes possess dual glucuronyltransferase and galactosaminyltransferase activity and play critical roles in the biosynthesis of chondroitin sulfate, a glycosaminoglycan involved in many biological processes including cell proliferation and morphogenesis. Decreased expression of this gene may play a role in colorectal cancer, and mutations in this gene are a cause of temtamy preaxial brachydactyly syndrome. ENSG00000131873 chondroitin sulfate synthase 1 NA
JCHAIN 3512 NA ENSG00000132465 joining chain of multimeric IgA and IgM NA
CYP17A1-AS1 102724307 NA ENSG00000203886 CYP17A1 antisense RNA 1 NA
LGR4 55366 G protein-coupled receptors (GPCRs) play key roles in a variety of physiologic functions. Members of the leucine-rich GPCR (LGR) family, such as GPR48, have multiple N-terminal leucine-rich repeats (LRRs) and a 7-transmembrane domain (Weng et al., 2008 [PubMed 18424556]). ENSG00000205213 leucine rich repeat containing G protein-coupled receptor 4 NA
ZNF385A 25946 Zinc finger proteins, such as ZNF385A, are regulatory proteins that act as transcription factors, bind single- or double-stranded RNA, or interact with other proteins (Sharma et al., 2004 [PubMed 15527981]). ENSG00000161642 zinc finger protein 385A NA
RP11-155G14.6 ENSG00000240758 NA ENSG00000240758 NA NA
HLA-DRB1 3123 HLA-DRB1 belongs to the HLA class II beta chain paralogs. The class II molecule is a heterodimer consisting of an alpha (DRA) and a beta chain (DRB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The beta chain is approximately 26-28 kDa. It is encoded by 6 exons. Exon one encodes the leader peptide; exons 2 and 3 encode the two extracellular domains; exon 4 encodes the transmembrane domain; and exon 5 encodes the cytoplasmic tail. Within the DR molecule the beta chain contains all the polymorphisms specifying the peptide binding specificities. Hundreds of DRB1 alleles have been described and typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. DRB1 is expressed at a level five times higher than its paralogs DRB3, DRB4 and DRB5. DRB1 is present in all individuals. Allelic variants of DRB1 are linked with either none or one of the genes DRB3, DRB4 and DRB5. There are 4 related pseudogenes: DRB2, DRB6, DRB7, DRB8 and DRB9. ENSG00000196126 major histocompatibility complex, class II, DR beta 1 NA
LOC105369230 105369230 NA ENSG00000196126 HLA class II histocompatibility antigen, DRB1-7 beta chain NA
CPXM1 56265 This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. ENSG00000088882 carboxypeptidase X (M14 family), member 1 NA
RP11-809N8.2 ENSG00000256928 NA ENSG00000256928 NA NA
PDIA2 64714 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). ENSG00000185615 protein disulfide isomerase family A member 2 NA
CCDC74A 90557 NA ENSG00000163040 coiled-coil domain containing 74A NA
WNT11 7481 The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 97%, 85%, and 63% amino acid identity with mouse, chicken, and Xenopus Wnt11 protein, respectively. This gene may play roles in the development of skeleton, kidney and lung, and is considered to be a plausible candidate gene for High Bone Mass Syndrome. ENSG00000085741 Wnt family member 11 NA
RGS5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. ENSG00000143248 regulator of G-protein signaling 5 NA
NA NA NA ENSG00000270172 NA TRUE
DEFA5 1670 Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. Several of the alpha defensin genes appear to be clustered on chromosome 8. The protein encoded by this gene, defensin, alpha 5, is highly expressed in the secretory granules of Paneth cells of the ileum. ENSG00000164816 defensin alpha 5 NA
NCF2 4688 This gene encodes neutrophil cytosolic factor 2, the 67-kilodalton cytosolic subunit of the multi-protein NADPH oxidase complex found in neutrophils. This oxidase produces a burst of superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in this gene, as well as in other NADPH oxidase subunits, can result in chronic granulomatous disease, a disease that causes recurrent infections by catalase-positive organisms. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000116701 neutrophil cytosolic factor 2 NA
LY6G6C 80740 LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). ENSG00000204421 lymphocyte antigen 6 complex, locus G6C NA
RP11-510N19.5 ENSG00000249007 NA ENSG00000249007 NA NA
CELA3B 23436 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. ENSG00000219073 chymotrypsin like elastase family member 3B NA
TMEM97 27346 TMEM97 is a conserved integral membrane protein that plays a role in controlling cellular cholesterol levels (Bartz et al., 2009 [PubMed 19583955]). ENSG00000109084 transmembrane protein 97 NA
CLMP 79827 This gene encodes a type I transmembrane protein that is localized to junctional complexes between endothelial and epithelial cells and may have a role in cell-cell adhesion. Expression of this gene in white adipose tissue is implicated in adipocyte maturation and development of obesity. This gene is also essential for normal intestinal development and mutations in the gene are associated with congenital short bowel syndrome. ENSG00000166250 CXADR-like membrane protein NA
AC073410.1 ENSG00000236047 NA ENSG00000236047 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
HOP homeobox 84525 The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOPX ENSG00000171476 NA
solute carrier family 1 member 3 6507 This gene encodes a member of a member of a high affinity glutamate transporter family. This gene functions in the termination of excitatory neurotransmission in central nervous system. Mutations are associated with episodic ataxia, Type 6. Alternative splicing results in multiple transcript variants. SLC1A3 ENSG00000079215 NA
fatty acid binding protein 5 pseudogene 7 ENSG00000234964 NA FABP5P7 ENSG00000234964 NA
peptidyl arginine deiminase 2 11240 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. PADI2 ENSG00000117115 NA
family with sequence similarity 171 member A2 284069 NA FAM171A2 ENSG00000161682 NA
cytochrome P450 family 4 subfamily F member 29, pseudogene 54055 NA CYP4F29P ENSG00000228314 NA
protein disulfide isomerase family A member 2 64714 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). PDIA2 ENSG00000185615 NA
purinergic receptor P2X 1 5023 The protein encoded by this gene belongs to the P2X family of G-protein-coupled receptors. These proteins can form homo-and heterotimers and function as ATP-gated ion channels and mediate rapid and selective permeability to cations. This protein is primarily localized to smooth muscle where binds ATP and mediates synaptic transmission between neurons and from neurons to smooth muscle and may being responsible for sympathetic vasoconstriction in small arteries, arterioles and vas deferens. Mouse studies suggest that this receptor is essential for normal male reproductive function. This protein may also be involved in promoting apoptosis. P2RX1 ENSG00000108405 NA
regenerating family member 1 beta 5968 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1B ENSG00000172023 NA
C-type lectin domain family 2 member B 9976 This gene encodes a member of the C-type lectin/C-type lectin-like domain (CTL/CTLD) superfamily. Members of this family share a common protein fold and have diverse functions, such as cell adhesion, cell-cell signalling, glycoprotein turnover, and roles in inflammation and immune response. The encoded type 2 transmembrane protein may function as a cell activation antigen. An alternative splice variant has been described but its full-length sequence has not been determined. This gene is closely linked to other CTL/CTLD superfamily members on chromosome 12p13 in the natural killer gene complex region. CLEC2B ENSG00000110852 NA
regenerating family member 1 alpha 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A ENSG00000115386 NA
mutated in colorectal cancers 4163 This gene is a candidate colorectal tumor suppressor gene that is thought to negatively regulate cell cycle progression. The orthologous gene in the mouse expresses a phosphoprotein associated with the plasma membrane and membrane organelles, and overexpression of the mouse protein inhibits entry into S phase. Multiple transcript variants encoding different isoforms have been found for this gene. MCC ENSG00000171444 NA
regenerating family member 3 alpha 5068 This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. REG3A ENSG00000172016 NA
protease, serine 1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1 ENSG00000204983 NA
carboxypeptidase X (M14 family), member 1 56265 This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. CPXM1 ENSG00000088882 NA
carcinoembryonic antigen related cell adhesion molecule 1 634 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. CEACAM1 ENSG00000079385 NA
dual oxidase 1 53905 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. DUOX1 ENSG00000137857 NA
chymotrypsin like elastase family member 3B 23436 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. CELA3B ENSG00000219073 NA
cytochrome P450 family 11 subfamily A member 1 1583 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and catalyzes the conversion of cholesterol to pregnenolone, the first and rate-limiting step in the synthesis of the steroid hormones. Two transcript variants encoding different isoforms have been found for this gene. The cellular location of the smaller isoform is unclear since it lacks the mitochondrial-targeting transit peptide. CYP11A1 ENSG00000140459 NA
aldo-keto reductase family 7 member A3 22977 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. AKR7A3 ENSG00000162482 NA
cellular retinoic acid binding protein 2 1382 This gene encodes a member of the retinoic acid (RA, a form of vitamin A) binding protein family and lipocalin/cytosolic fatty-acid binding protein family. The protein is a cytosol-to-nuclear shuttling protein, which facilitates RA binding to its cognate receptor complex and transfer to the nucleus. It is involved in the retinoid signaling pathway, and is associated with increased circulating low-density lipoprotein cholesterol. Alternatively spliced transcript variants encoding the same protein have been found for this gene. CRABP2 ENSG00000143320 NA
NA ENSG00000272275 NA RP11-791G15.2 ENSG00000272275 NA
vomeronasal 1 receptor 82 pseudogene ENSG00000268995 NA VN1R82P ENSG00000268995 NA
solute carrier family 39 member 5 283375 The protein encoded by this gene belongs to the ZIP family of zinc transporters that transport zinc into cells from outside, and play a crucial role in controlling intracellular zinc levels. Zinc is an essential cofactor for many enzymes and proteins involved in gene transcription, growth, development and differentiation. Mutations in this gene have been associated with autosomal dominant high myopia (MYP24). Alternatively spliced transcript variants have been found for this gene. SLC39A5 ENSG00000139540 NA
glycoprotein 2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. GP2 ENSG00000169347 NA
leucine rich repeat containing 4 64101 This gene is significantly downregulated in primary brain tumors. The exact function of the protein encoded by this gene is unknown. LRRC4 ENSG00000128594 NA
erythrocyte membrane protein band 4.1 like 3 23136 NA EPB41L3 ENSG00000082397 NA
transthyretin 7276 This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. TTR ENSG00000118271 NA
alkaline phosphatase, liver/bone/kidney 249 This gene encodes a member of the alkaline phosphatase family of proteins. There are at least four distinct but related alkaline phosphatases: intestinal, placental, placental-like, and liver/bone/kidney (tissue non-specific). The first three are located together on chromosome 2, while the tissue non-specific form is located on chromosome 1. The product of this gene is a membrane bound glycosylated enzyme that is not expressed in any particular tissue and is, therefore, referred to as the tissue-nonspecific form of the enzyme. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature enzyme. This enzyme may play a role in bone mineralization. Mutations in this gene have been linked to hypophosphatasia, a disorder that is characterized by hypercalcemia and skeletal defects. ALPL ENSG00000162551 NA
protocadherin 18 54510 This gene belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. This gene encodes a protein which contains 6 extracellular cadherin domains, a transmembrane domain and a cytoplasmic tail differing from those of the classical cadherins. Although its specific function is undetermined, the cadherin-related neuronal receptor is thought to play a role in the establishment and function of specific cell-cell connections in the brain. PCDH18 ENSG00000189184 NA
adenylate cyclase 4 196883 This gene encodes a member of the family of adenylate cyclases, which are membrane-associated enzymes that catalyze the formation of the secondary messenger cyclic adenosine monophosphate (cAMP). Mouse studies show that adenylate cyclase 4, along with adenylate cyclases 2 and 3, is expressed in olfactory cilia, suggesting that several different adenylate cyclases may couple to olfactory receptors and that there may be multiple receptor-mediated mechanisms for the generation of cAMP signals. Alternative splicing results in transcript variants. ADCY4 ENSG00000129467 NA
chymotrypsin like elastase family member 3A 10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A ENSG00000142789 NA
purinergic receptor P2X 7 5027 The product of this gene belongs to the family of purinoceptors for ATP. This receptor functions as a ligand-gated ion channel and is responsible for ATP-dependent lysis of macrophages through the formation of membrane pores permeable to large molecules. Activation of this nuclear receptor by ATP in the cytoplasm may be a mechanism by which cellular activity can be coupled to changes in gene expression. Multiple alternatively spliced variants have been identified, most of which fit nonsense-mediated decay (NMD) criteria. P2RX7 ENSG00000089041 NA
keratin 19 3880 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. KRT19 ENSG00000171345 NA
apolipoprotein C3 345 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. APOC3 ENSG00000110245 NA
FCH domain only 2 115548 NA FCHO2 ENSG00000157107 NA
peroxisomal biogenesis factor 16 9409 The protein encoded by this gene is an integral peroxisomal membrane protein. An inactivating nonsense mutation localized to this gene was observed in a patient with Zellweger syndrome of the complementation group CGD/CG9. Expression of this gene product morphologically and biochemically restores the formation of new peroxisomes, suggesting a role in peroxisome organization and biogenesis. Alternative splicing has been observed for this gene and two variants have been described. PEX16 ENSG00000121680 NA
delta like non-canonical Notch ligand 1 8788 This gene encodes a transmembrane protein that contains multiple epidermal growth factor repeats that functions as a regulator of cell growth. The encoded protein is involved in the differentiation of several cell types including adipocytes. This gene is located in a region of chromosome 14 frequently showing unparental disomy, and is imprinted and expressed from the paternal allele. A single nucleotide variant in this gene is associated with child and adolescent obesity and shows polar overdominance, where heterozygotes carrying an active paternal allele express the phenotype, while mutant homozygotes are normal. DLK1 ENSG00000185559 NA
receptor activity modifying protein 1 10267 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. RAMP1 ENSG00000132329 NA
NA ENSG00000259185 NA RP11-56B16.4 ENSG00000259185 NA
protein kinase (cAMP-dependent, catalytic) inhibitor alpha 5569 The protein encoded by this gene is a member of the cAMP-dependent protein kinase (PKA) inhibitor family. This protein was demonstrated to interact with and inhibit the activities of both C alpha and C beta catalytic subunits of the PKA. Alternatively spliced transcript variants encoding the same protein have been reported. PKIA ENSG00000171033 NA
NA ENSG00000254198 NA RP11-598P20.3 ENSG00000254198 NA
zinc finger protein 471 57573 NA ZNF471 ENSG00000196263 NA
solute carrier family 6 member 16 28968 SLC6A16 shows structural characteristics of an Na(+)- and Cl(-)-dependent neurotransmitter transporter, including 12 transmembrane (TM) domains, intracellular N and C termini, and large extracellular loops containing multiple N-glycosylation sites. SLC6A16 ENSG00000063127 NA
3-hydroxyanthranilate 3,4-dioxygenase 23498 3-Hydroxyanthranilate 3,4-dioxygenase is a monomeric cytosolic protein belonging to the family of intramolecular dioxygenases containing nonheme ferrous iron. It is widely distributed in peripheral organs, such as liver and kidney, and is also present in low amounts in the central nervous system. HAAO catalyzes the synthesis of quinolinic acid (QUIN) from 3-hydroxyanthranilic acid. QUIN is an excitotoxin whose toxicity is mediated by its ability to activate glutamate N-methyl-D-aspartate receptors. Increased cerebral levels of QUIN may participate in the pathogenesis of neurologic and inflammatory disorders. HAAO has been suggested to play a role in disorders associated with altered tissue levels of QUIN. HAAO ENSG00000162882 NA
lipoprotein lipase 4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. LPL ENSG00000175445 NA
coiled-coil domain containing 181 57821 NA CCDC181 ENSG00000117477 NA
tumor suppressor candidate 3 7991 This gene is a candidate tumor suppressor gene. It is located within a homozygously deleted region of a metastatic prostate cancer. The gene is expressed in most nonlymphoid human tissues including prostate, lung, liver, and colon. Expression was also detected in many epithelial tumor cell lines. Two transcript variants encoding distinct isoforms have been identified for this gene. TUSC3 ENSG00000104723 NA
periplakin 5493 The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. PPL ENSG00000118898 NA
pancreatic lipase 5406 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. PNLIP ENSG00000175535 NA
ATP binding cassette subfamily A member 1 19 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. With cholesterol as its substrate, this protein functions as a cholesteral efflux pump in the cellular lipid removal pathway. Mutations in this gene have been associated with Tangier’s disease and familial high-density lipoprotein deficiency. ABCA1 ENSG00000165029 NA
ribosomal protein L3 pseudogene 4 ENSG00000232573 NA RPL3P4 ENSG00000232573 NA
Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV) ubiquitously expressed 2197 This gene is the cellular homolog of the fox sequence in the Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV). It encodes a fusion protein consisting of the ubiquitin-like protein fubi at the N terminus and ribosomal protein S30 at the C terminus. It has been proposed that the fusion protein is post-translationally processed to generate free fubi and free ribosomal protein S30. Fubi is a member of the ubiquitin family, and ribosomal protein S30 belongs to the S30E family of ribosomal proteins. Whereas the function of fubi is currently unknown, ribosomal protein S30 is a component of the 40S subunit of the cytoplasmic ribosome and displays antimicrobial activity. Pseudogenes derived from this gene are present in the genome. Similar to ribosomal protein S30, ribosomal proteins S27a and L40 are synthesized as fusion proteins with ubiquitin. FAU ENSG00000149806 NA
heat shock protein family A (Hsp70) member 12B 116835 The protein encoded by this gene contains an atypical heat shock protein 70 (Hsp70) ATPase domain and is therefore a distant member of the mammalian Hsp70 family. This gene may be involved in susceptibility to atherosclerosis. Alternative splicing results in multiple transcript variants encoding different isoforms. HSPA12B ENSG00000132622 NA
NA ENSG00000258177 NA RP11-394J1.2 ENSG00000258177 NA
FERM domain containing 6 122786 NA FRMD6 ENSG00000139926 NA
cornifelin 84518 NA CNFN ENSG00000105427 NA
SIVA1 apoptosis inducing factor 10572 This gene encodes a protein with an important role in the apoptotic (programmed cell death) pathway induced by the CD27 antigen, a member of the tumor necrosis factor receptor (TFNR) superfamily. The CD27 antigen cytoplasmic tail binds to the N-terminus of this protein. Two alternatively spliced transcript variants encoding distinct proteins have been described. SIVA1 ENSG00000184990 NA
C-X-C motif chemokine ligand 10 3627 This antimicrobial gene encodes a chemokine of the CXC subfamily and ligand for the receptor CXCR3. Binding of this protein to CXCR3 results in pleiotropic effects, including stimulation of monocytes, natural killer and T-cell migration, and modulation of adhesion molecule expression. CXCL10 ENSG00000169245 NA
RALA Ras like proto-oncogene A 5898 The product of this gene belongs to the small GTPase superfamily, Ras family of proteins. GTP-binding proteins mediate the transmembrane signaling initiated by the occupancy of certain cell surface receptors. This gene encodes a low molecular mass ras-like GTP-binding protein that shares about 50% similarity with other ras proteins. RALA ENSG00000006451 NA
long intergenic non-protein coding RNA 1094 ENSG00000251442 NA LINC01094 ENSG00000251442 NA
myelin expression factor 2 50804 NA MYEF2 ENSG00000104177 NA
C-X-C motif chemokine ligand 14 9547 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. CXCL14 ENSG00000145824 NA
purinergic receptor P2Y1 5028 The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor functions as a receptor for extracellular ATP and ADP. In platelets binding to ADP leads to mobilization of intracellular calcium ions via activation of phospholipase C, a change in platelet shape, and probably to platelet aggregation. P2RY1 ENSG00000169860 NA
family with sequence similarity 20 member A 54757 This locus encodes a protein that is likely secreted and may function in hematopoiesis. A mutation at this locus has been associated with amelogenesis imperfecta and gingival hyperplasia syndrome. Alternatively spliced transcript variants have been identified. FAM20A ENSG00000108950 NA
collectin subfamily member 12 81035 This gene encodes a member of the C-lectin family, proteins that possess collagen-like sequences and carbohydrate recognition domains. This protein is a scavenger receptor, a cell surface glycoprotein that displays several functions associated with host defense. It can bind to carbohydrate antigens on microorganisms, facilitating their recognition and removal. It also mediates the recognition, internalization, and degradation of oxidatively modified low density lipoprotein by vascular endothelial cells. COLEC12 ENSG00000158270 NA
troponin C1, slow skeletal and cardiac type 7134 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. TNNC1 ENSG00000114854 NA
F-box protein 16 157574 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbx class. Multiple transcript variants encoding different isoforms have been found for this gene. FBXO16 ENSG00000214050 NA
nestin 10763 This gene encodes a member of the intermediate filament protein family and is expressed primarily in nerve cells. NES ENSG00000132688 NA
cadherin EGF LAG seven-pass G-type receptor 1 9620 The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. This particular member is a developmentally regulated, neural-specific gene which plays an unspecified role in early embryogenesis. CELSR1 ENSG00000075275 NA
EPH receptor A4 2043 This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. Multiple transcript variants encoding different isoforms have been found for this gene. EPHA4 ENSG00000116106 NA
angiomotin like 2 51421 Angiomotin is a protein that binds angiostatin, a circulating inhibitor of the formation of new blood vessels (angiogenesis). Angiomotin mediates angiostatin inhibition of endothelial cell migration and tube formation in vitro. The protein encoded by this gene is related to angiomotin and is a member of the motin protein family. Alternative splicing results in multiple transcript variants of this gene. AMOTL2 ENSG00000114019 NA
ATP binding cassette subfamily C member 5 10057 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This protein functions in the cellular export of its substrate, cyclic nucleotides. This export contributes to the degradation of phosphodiesterases and possibly an elimination pathway for cyclic nucleotides. Studies show that this protein provides resistance to thiopurine anticancer drugs, 6-mercatopurine and thioguanine, and the anti-HIV drug 9-(2-phosphonylmethoxyethyl)adenine. This protein may be involved in resistance to thiopurines in acute lymphoblastic leukemia and antiretroviral nucleoside analogs in HIV-infected patients. Alternative splicing results in multiple transcript variants. ABCC5 ENSG00000114770 NA
BCL2/adenovirus E1B 19kD interacting protein like 149428 The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BNIPL ENSG00000163141 NA
sulfotransferase family 2B member 1 6820 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene sulfates dehydroepiandrosterone but not 4-nitrophenol, a typical substrate for the phenol and estrogen sulfotransferase subfamilies. Two alternatively spliced variants that encode different isoforms have been described. SULT2B1 ENSG00000088002 NA
NA NA NA NA ENSG00000250404 TRUE
lipase F, gastric type 8513 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. LIPF ENSG00000182333 NA
gremlin 1, DAN family BMP antagonist 26585 This gene encodes a member of the BMP (bone morphogenic protein) antagonist family. Like BMPs, BMP antagonists contain cystine knots and typically form homo- and heterodimers. The CAN (cerberus and dan) subfamily of BMP antagonists, to which this gene belongs, is characterized by a C-terminal cystine knot with an eight-membered ring. The antagonistic effect of the secreted glycosylated protein encoded by this gene is likely due to its direct binding to BMP proteins. As an antagonist of BMP, this gene may play a role in regulating organogenesis, body patterning, and tissue differentiation. In mouse, this protein has been shown to relay the sonic hedgehog (SHH) signal from the polarizing region to the apical ectodermal ridge during limb bud outgrowth. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. GREM1 ENSG00000166923 NA
V-set and immunoglobulin domain containing 10 54621 NA VSIG10 ENSG00000176834 NA
NA NA NA NA ENSG00000229874 TRUE
NA ENSG00000257831 NA RP11-596D21.1 ENSG00000257831 NA
growth hormone receptor 2690 This gene encodes a member of the type I cytokine receptor family, which is a transmembrane receptor for growth hormone. Binding of growth hormone to the receptor leads to receptor dimerization and the activation of an intra- and intercellular signal transduction pathway leading to growth. Mutations in this gene have been associated with Laron syndrome, also known as the growth hormone insensitivity syndrome (GHIS), a disorder characterized by short stature. In humans and rabbits, but not rodents, growth hormone binding protein (GHBP) is generated by proteolytic cleavage of the extracellular ligand-binding domain from the mature growth hormone receptor protein. Multiple alternatively spliced transcript variants have been found for this gene. GHR ENSG00000112964 NA
NA NA NA NA ENSG00000250606 TRUE
dedicator of cytokinesis 4 9732 This gene is a member of the dedicator of cytokinesis (DOCK) family and encodes a protein with a DHR-1 (CZH-1) domain, a DHR-2 (CZH-2) domain and an SH3 domain. This membrane-associated, cytoplasmic protein functions as a guanine nucleotide exchange factor and is involved in regulation of adherens junctions between cells. Mutations in this gene have been associated with ovarian, prostate, glioma, and colorectal cancers. Alternatively spliced variants which encode different protein isoforms have been described, but only one has been fully characterized. DOCK4 ENSG00000128512 NA
serine peptidase inhibitor, Kazal type 5 11005 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. SPINK5 ENSG00000133710 NA
RAB20, member RAS oncogene family 55647 NA RAB20 ENSG00000139832 NA
zinc finger E-box binding homeobox 2 9839 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. ZEB2 ENSG00000169554 NA
surfactant protein C 6440 This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. SFTPC ENSG00000168484 NA
NME/NM23 nucleoside diphosphate kinase 2 pseudogene 1 ENSG00000123009 NA NME2P1 ENSG00000123009 NA
neuritin 1 51299 This gene encodes a member of the neuritin family, and is expressed in postmitotic-differentiating neurons of the developmental nervous system and neuronal structures associated with plasticity in the adult. The expression of this gene can be induced by neural activity and neurotrophins. The encoded protein contains a consensus cleavage signal found in glycosylphoshatidylinositol (GPI)-anchored proteins. The encoded protein promotes neurite outgrowth and arborization, suggesting its role in promoting neuritogenesis. Overexpression of the encoded protein may be associated with astrocytoma progression. Alternative splicing results in multiple transcript variants. NRN1 ENSG00000124785 NA
CUB and zona pellucida like domains 1 50624 NA CUZD1 ENSG00000138161 NA
F-box and leucine rich repeat protein 22 283807 This gene encodes a member of the F-box protein family. This F-box protein interacts with S-phase kinase-associated protein 1A and cullin in order to form SCF complexes which function as ubiquitin ligases. FBXL22 ENSG00000197361 NA
prostate transmembrane protein, androgen induced 1 56937 This gene encodes a transmembrane protein that contains a Smad interacting motif (SIM). Expression of this gene is induced by androgens and transforming growth factor beta, and the encoded protein suppresses the androgen receptor and transforming growth factor beta signaling pathways though interactions with Smad proteins. Overexpression of this gene may play a role in multiple types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PMEPA1 ENSG00000124225 NA
uncharacterized LOC105371397 105371397 NA LOC105371397 ENSG00000104731 NA
kelch domain containing 4 54758 NA KLHDC4 ENSG00000104731 NA
uncharacterized LOC100130691 100130691 NA LOC100130691 ENSG00000213963 NA
acyl-CoA thioesterase 11 26027 This gene encodes a member of the acyl-CoA thioesterase family which catalyse the conversion of activated fatty acids to the corresponding non-esterified fatty acid and coenzyme A. Expression of a mouse homolog in brown adipose tissue is induced by low temperatures and repressed by warm temperatures. Higher levels of expression of the mouse homolog has been found in obesity-resistant mice compared with obesity-prone mice, suggesting a role of acyl-CoA thioesterase 11 in obesity. Alternative splicing results in transcript variants. ACOT11 ENSG00000162390 NA
activin A receptor like type 1 94 This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. ACVRL1 ENSG00000139567 NA
hyaluronan synthase 3 3038 The protein encoded by this gene is involved in the synthesis of the unbranched glycosaminoglycan hyaluronan, or hyaluronic acid, which is a major constituent of the extracellular matrix. This gene is a member of the NODC/HAS gene family. Compared to the proteins encoded by other members of this gene family, this protein appears to be more of a regulator of hyaluronan synthesis. Alternative splicing results in multiple transcript variants. HAS3 ENSG00000103044 NA
NA ENSG00000213280 NA RP11-212P7.1 ENSG00000213280 NA
uncharacterized LOC400221 400221 NA FLJ22447 ENSG00000232774 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
1114 This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. chromogranin B CHGB ENSG00000089199 NA
5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. protease, serine 1 PRSS1 ENSG00000204983 NA
6590 This gene encodes a secreted inhibitor which protects epithelial tissues from serine proteases. It is found in various secretions including seminal plasma, cervical mucus, and bronchial secretions, and has affinity for trypsin, leukocyte elastase, and cathepsin G. Its inhibitory effect contributes to the immune response by protecting epithelial surfaces from attack by endogenous proteolytic enzymes. This antimicrobial protein has antibacterial, antifungal and antiviral activity. secretory leukocyte peptidase inhibitor SLPI ENSG00000124107 NA
5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 alpha REG1A ENSG00000115386 NA
10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. chymotrypsin like elastase family member 3A CELA3A ENSG00000142789 NA
4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha MYH6 ENSG00000197616 NA
149428 The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BCL2/adenovirus E1B 19kD interacting protein like BNIPL ENSG00000163141 NA
388121 NA TNF alpha induced protein 8 like 3 TNFAIP8L3 ENSG00000183578 NA
5406 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. pancreatic lipase PNLIP ENSG00000175535 NA
1208 The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. colipase CLPS ENSG00000137392 NA
23650 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. tripartite motif containing 29 TRIM29 ENSG00000137699 NA
2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. glycoprotein 2 GP2 ENSG00000169347 NA
63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. cell death inducing DFFA like effector c CIDEC ENSG00000187288 NA
ENSG00000255883 NA NA RP11-79P5.10 ENSG00000255883 NA
6382 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. syndecan 1 SDC1 ENSG00000115884 NA
6273 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may have a tumor suppressor function. Chromosomal rearrangements and altered expression of this gene have been implicated in breast cancer. S100 calcium binding protein A2 S100A2 ENSG00000196754 NA
5968 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. regenerating family member 1 beta REG1B ENSG00000172023 NA
63036 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. chymotrypsin like elastase family member 2A CELA2A ENSG00000142615 NA
5068 This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. regenerating family member 3 alpha REG3A ENSG00000172016 NA
83643 NA coiled-coil domain containing 3 CCDC3 ENSG00000151468 NA
4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. lipoprotein lipase LPL ENSG00000175445 NA
7137 Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). troponin I3, cardiac type TNNI3 ENSG00000129991 NA
5319 This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. phospholipase A2 group IB PLA2G1B ENSG00000170890 NA
5625 This gene encodes a mitochondrial protein that catalyzes the first step in proline degradation. Mutations in this gene are associated with hyperprolinemia type 1 and susceptibility to schizophrenia 4 (SCZD4). This gene is located on chromosome 22q11.21, a region which has also been associated with the contiguous gene deletion syndromes, DiGeorge and CATCH22. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. proline dehydrogenase 1 PRODH ENSG00000100033 NA
9388 The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. lipase G, endothelial type LIPG ENSG00000101670 NA
NA NA NA NA ENSG00000250606 TRUE
3945 This gene encodes the B subunit of lactate dehydrogenase enzyme, which catalyzes the interconversion of pyruvate and lactate with concomitant interconversion of NADH and NAD+ in a post-glycolysis process. Alternatively spliced transcript variants have been found for this gene. Recent studies have shown that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Mutations in this gene are associated with lactate dehydrogenase B deficiency. Pseudogenes have been identified on chromosomes X, 5 and 13. lactate dehydrogenase B LDHB ENSG00000111716 NA
440387 NA chymotrypsinogen B2 CTRB2 ENSG00000168928 NA
5360 The protein encoded by this gene is one of at least two lipid transfer proteins found in human plasma. The encoded protein transfers phospholipids from triglyceride-rich lipoproteins to high density lipoprotein (HDL). In addition to regulating the size of HDL particles, this protein may be involved in cholesterol metabolism. At least two transcript variants encoding different isoforms have been found for this gene. phospholipid transfer protein PLTP ENSG00000100979 NA
84676 This gene encodes a member of the RING zinc finger protein family found in striated muscle and iris. The product of this gene is an E3 ubiquitin ligase that localizes to the Z-line and M-line lattices of myofibrils. This protein plays an important role in the atrophy of skeletal and cardiac muscle and is required for the degradation of myosin heavy chain proteins, myosin light chain, myosin binding protein, and for muscle-type creatine kinase. tripartite motif containing 63 TRIM63 ENSG00000158022 NA
8048 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. cysteine and glycine rich protein 3 CSRP3 ENSG00000129170 NA
80740 LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). lymphocyte antigen 6 complex, locus G6C LY6G6C ENSG00000204421 NA
146433 Interleukin-34 is a cytokine that promotes the differentiation and viability of monocytes and macrophages through the colony-stimulating factor-1 receptor (CSF1R; MIM 164770) (Lin et al., 2008 [PubMed 18467591]). interleukin 34 IL34 ENSG00000157368 NA
140862 NA isthmin 1, angiogenesis inhibitor ISM1 ENSG00000101230 NA
2114 This gene encodes a transcription factor which regulates genes involved in development and apoptosis. The encoded protein is also a protooncogene and shown to be involved in regulation of telomerase. A pseudogene of this gene is located on the X chromosome. Alternative splicing results in multiple transcript variants. ETS proto-oncogene 2, transcription factor ETS2 ENSG00000157557 NA
1504 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. chymotrypsinogen B1 CTRB1 ENSG00000168925 NA
ENSG00000254373 NA NA RP11-34P1.2 ENSG00000254373 NA
375791 NA cysteine rich tail 1 CYSRT1 ENSG00000197191 NA
5271 The superfamily of high molecular weight serine proteinase inhibitors (serpins) regulate a diverse set of intracellular and extracellular processes such as complement activation, fibrinolysis, coagulation, cellular differentiation, tumor suppression, apoptosis, and cell migration. Serpins are characterized by well-conserved a tertiary structure that consists of 3 beta sheets and 8 or 9 alpha helices (Huber and Carrell, 1989 [PubMed 2690952]). A critical portion of the molecule, the reactive center loop connects beta sheets A and C. Protease inhibitor-8 (PI8; SERPINB8) is a member of the ov-serpin subfamily, which, relative to the archetypal serpin PI1 (MIM 107400), is characterized by a high degree of homology to chicken ovalbumin, lack of N- and C-terminal extensions, absence of a signal peptide, and a serine rather than an asparagine residue at the penultimate position (summary by Bartuski et al., 1997 [PubMed 9268635]). serpin family B member 8 SERPINB8 ENSG00000166401 NA
27254 NA cold shock domain containing C2 CSDC2 ENSG00000172346 NA
53833 IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]). interleukin 20 receptor subunit beta IL20RB ENSG00000174564 NA
4192 This gene encodes a member of a small family of secreted growth factors that binds heparin and responds to retinoic acid. The encoded protein promotes cell growth, migration, and angiogenesis, in particular during tumorigenesis. This gene has been targeted as a therapeutic for a variety of different disorders. Alternatively spliced transcript variants encoding multiple isoforms have been observed. midkine (neurite growth-promoting factor 2) MDK ENSG00000110492 NA
ENSG00000244619 NA NA RP11-315I20.3 ENSG00000244619 NA
8515 Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. integrin subunit alpha 10 ITGA10 ENSG00000143127 NA
3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. lipase E, hormone sensitive type LIPE ENSG00000079435 NA
5407 NA pancreatic lipase related protein 1 PNLIPRP1 ENSG00000187021 NA
563 NA alpha-2-glycoprotein 1, zinc-binding AZGP1 ENSG00000160862 NA
9781 The protein encoded by this protein contains a RING finger, a motif known to be involved in protein-DNA and protein-protein interactions. The mouse counterpart of this protein has been shown to interact with Ube2l3/UbcM4, which is an ubiquitin-conjugating enzyme involved in embryonic development. ring finger protein 144A RNF144A ENSG00000151692 NA
51161 NA chromosome 3 open reading frame 18 C3orf18 ENSG00000088543 NA
114880 This gene encodes a member of the oxysterol-binding protein (OSBP) family, a group of intracellular lipid receptors. Most members contain an N-terminal pleckstrin homology domain and a highly conserved C-terminal OSBP-like sterol-binding domain. Transcript variants encoding different isoforms have been identified. oxysterol binding protein like 6 OSBPL6 ENSG00000079156 NA
4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta MYH7 ENSG00000092054 NA
192683 NA secretory carrier membrane protein 5 SCAMP5 ENSG00000198794 NA
NA NA NA NA ENSG00000156750 TRUE
23475 This gene encodes a key enzyme in catabolism of quinolinate, an intermediate in the tryptophan-nicotinamide adenine dinucleotide pathway. Quinolinate acts as a most potent endogenous exitotoxin to neurons. Elevation of quinolinate levels in the brain has been linked to the pathogenesis of neurodegenerative disorders such as epilepsy, Alzheimer’s disease, and Huntington’s disease. Alternative splicing results in multiple transcript variants. quinolinate phosphoribosyltransferase QPRT ENSG00000103485 NA
3557 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. interleukin 1 receptor antagonist IL1RN ENSG00000136689 NA
5318 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. plakophilin 2 PKP2 ENSG00000057294 NA
4501 NA metallothionein 1X MT1X ENSG00000187193 NA
257396 NA uncharacterized LOC257396 LOC257396 ENSG00000247796 NA
84152 This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. protein phosphatase 1 regulatory inhibitor subunit 1B PPP1R1B ENSG00000131771 NA
ENSG00000229017 NA long intergenic non-protein coding RNA 1277 LINC01277 ENSG00000229017 NA
23436 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. chymotrypsin like elastase family member 3B CELA3B ENSG00000219073 NA
105370792 NA uncharacterized LOC105370792 LOC105370792 ENSG00000174171 NA
1474 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. cystatin E/M CST6 ENSG00000175315 NA
ENSG00000258444 NA NA CTD-2201G16.1 ENSG00000258444 NA
10518 The protein encoded by this gene is similar to that of KIP/CIB, calcineurin B, and calmodulin. The encoded protein is a calcium-binding regulatory protein that interacts with DNA-dependent protein kinase catalytic subunits (DNA-PKcs), and it is involved in photoreceptor cell maintenance. Mutations in this gene cause deafness, autosomal recessive, 48 (DFNB48), and also Usher syndrome 1J (USH1J). Alternative splicing results in multiple transcript variants. calcium and integrin binding family member 2 CIB2 ENSG00000136425 NA
ENSG00000259827 NA NA RP11-343H19.2 ENSG00000259827 NA
440567 This gene has characteristics of a pseudogene derived from the UQCRH gene. However, there is still an open reading frame that could produce a protein of the same or nearly the same size as that of the UQCRH gene, so this gene is being called protein-coding for now. ubiquinol-cytochrome c reductase hinge protein like UQCRHL ENSG00000233954 NA
85379 NA KIAA1671 KIAA1671 ENSG00000197077 NA
222962 This gene encodes a member of the SLC29A/ENT transporter protein family. The encoded membrane protein catalyzes the reuptake of monoamines into presynaptic neurons, thus determining the intensity and duration of monoamine neural signaling. It has been shown to transport several compounds, including serotonin, dopamine, and the neurotoxin 1-methyl-4-phenylpyridinium. Alternative splicing results in multiple transcript variants. solute carrier family 29 member 4 SLC29A4 ENSG00000164638 NA
51032 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2B is secreted from the pancreas as a zymogen. In other species, elastase 2B has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. chymotrypsin like elastase family member 2B CELA2B ENSG00000215704 NA
55244 This gene is located within the Smith-Magenis syndrome region on chromosome 17. It encodes a protein of unknown function. solute carrier family 47 member 1 SLC47A1 ENSG00000142494 NA
29993 NA protein kinase C and casein kinase substrate in neurons 1 PACSIN1 ENSG00000124507 NA
9232 The encoded protein is a homolog of yeast securin proteins, which prevent separins from promoting sister chromatid separation. It is an anaphase-promoting complex (APC) substrate that associates with a separin until activation of the APC. The gene product has transforming activity in vitro and tumorigenic activity in vivo, and the gene is highly expressed in various tumors. The gene product contains 2 PXXP motifs, which are required for its transforming and tumorigenic activities, as well as for its stimulation of basic fibroblast growth factor expression. It also contains a destruction box (D box) that is required for its degradation by the APC. The acidic C-terminal region of the encoded protein can act as a transactivation domain. The gene product is mainly a cytosolic protein, although it partially localizes in the nucleus. Three transcript variants encoding the same protein have been found for this gene. pituitary tumor-transforming 1 PTTG1 ENSG00000164611 NA
221749 NA PX domain containing 1 PXDC1 ENSG00000168994 NA
401491 NA VLDLR antisense RNA 1 VLDLR-AS1 ENSG00000236404 NA
1846 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. dual specificity phosphatase 4 DUSP4 ENSG00000120875 NA
3868 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region of chromosome 17q12-q21. This keratin has been coexpressed with keratin 14 in a number of epithelial tissues, including esophagus, tongue, and hair follicles. Mutations in this gene are associated with type 1 pachyonychia congenita, non-epidermolytic palmoplantar keratoderma and unilateral palmoplantar verrucous nevus. keratin 16 KRT16 ENSG00000186832 NA
1056 The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. carboxyl ester lipase CEL ENSG00000170835 NA
374897 NA suprabasin SBSN ENSG00000189001 NA
64714 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). protein disulfide isomerase family A member 2 PDIA2 ENSG00000185615 NA
1647 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The protein encoded by this gene responds to environmental stresses by mediating activation of the p38/JNK pathway via MTK1/MEKK4 kinase. The DNA damage-induced transcription of this gene is mediated by both p53-dependent and -independent mechanisms. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. growth arrest and DNA damage inducible alpha GADD45A ENSG00000116717 NA
6676 The mammalian sperm flagellum contains two cytoskeletal structures associated with the axoneme: the outer dense fibers surrounding the axoneme in the midpiece and principal piece and the fibrous sheath surrounding the outer dense fibers in the principal piece of the tail. Defects in these structures are associated with abnormal tail morphology, reduced sperm motility, and infertility. In the rat, the protein encoded by this gene associates with an outer dense fiber protein via a leucine zipper motif and localizes to the microtubules of the manchette and axoneme during sperm tail development. Alternative splicing results in multiple transcript variants encoding different isoforms. sperm associated antigen 4 SPAG4 ENSG00000061656 NA
7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin TG ENSG00000042832 NA
8513 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. lipase F, gastric type LIPF ENSG00000182333 NA
7060 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. thrombospondin 4 THBS4 ENSG00000113296 NA
4320 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMP’s are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the enzyme encoded by this gene is activated intracellularly by furin within the constitutive secretory pathway. Also in contrast to other MMP’s, this enzyme cleaves alpha 1-proteinase inhibitor but weakly degrades structural proteins of the extracellular matrix. matrix metallopeptidase 11 MMP11 ENSG00000099953 NA
653499 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. galectin 7B LGALS7B ENSG00000178934 NA
NA NA NA NA ENSG00000165862 TRUE
4606 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. myosin binding protein C, fast type MYBPC2 ENSG00000086967 NA
65124 NA sosondowah ankyrin repeat domain family member C SOWAHC ENSG00000198142 NA
57559 NA STAM binding protein like 1 STAMBPL1 ENSG00000138134 NA
5228 This gene encodes a growth factor found in placenta which is homologous to vascular endothelial growth factor. Alternatively spliced transcripts encoding different isoforms have been found for this gene. placental growth factor PGF ENSG00000119630 NA
4734 NA neural precursor cell expressed, developmentally down-regulated 4, E3 ubiquitin protein ligase NEDD4 ENSG00000069869 NA
ENSG00000268896 NA NA RP11-256I23.1 ENSG00000268896 NA
56952 NA phosphoribosyl transferase domain containing 1 PRTFDC1 ENSG00000099256 NA
5799 This gene encodes a protein with sequence similarity to receptor-like protein tyrosine phosphatases. However, tyrosine phosphatase activity has not been experimentally validated for this protein. Studies of the rat ortholog suggest that the encoded protein may instead function as a phosphatidylinositol phosphatase with the ability to dephosphorylate phosphatidylinositol 3-phosphate and phosphatidylinositol 4,5-diphosphate, and this function may be involved in the regulation of insulin secretion. This protein has been identified as an autoantigen in insulin-dependent diabetes mellitus. Alternative splicing results in multiple transcript variants. protein tyrosine phosphatase, receptor type N2 PTPRN2 ENSG00000155093 NA
8557 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. titin-cap TCAP ENSG00000173991 NA
161247 FIT1 belongs to an evolutionarily conserved family of proteins involved in fat storage (Kadereit et al., 2008 [PubMed 18160536]). fat storage inducing transmembrane protein 1 FITM1 ENSG00000139914 NA
51560 NA RAB6B, member RAS oncogene family RAB6B ENSG00000154917 NA
57476 NA GRAM domain containing 1B GRAMD1B ENSG00000023171 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
CHGA 1113 chromogranin A ENSG00000100604 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. NA
PRSS3 5646 protease, serine 3 ENSG00000010438 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. NA
KIF5A 3798 kinesin family member 5A ENSG00000155980 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. NA
CPA2 1358 carboxypeptidase A2 ENSG00000158516 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. NA
EPCAM 4072 epithelial cell adhesion molecule ENSG00000119888 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. NA
ST3GAL6 10402 ST3 beta-galactoside alpha-2,3-sialyltransferase 6 ENSG00000064225 The protein encoded by this gene is a member of the sialyltransferase family. Members of this family are enzymes that transfer sialic acid from the activated cytidine 5’-monophospho-N-acetylneuraminic acid to terminal positions on sialylated glycolipids (gangliosides) or to the N- or O-linked sugar chains of glycoproteins. This protein has high specificity for neolactotetraosylceramide and neolactohexaosylceramide as glycolipid substrates and may contribute to the formation of selectin ligands and sialyl Lewis X, a carbohydrate important for cell-to-cell recognition and a blood group antigen. NA
ATP1B1 481 ATPase Na+/K+ transporting subunit beta 1 ENSG00000143153 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 1 subunit. Alternatively spliced transcript variants encoding different isoforms have been described, but their biological validity is not known. NA
PCOLCE2 26577 procollagen C-endopeptidase enhancer 2 ENSG00000163710 NA NA
HES6 55502 hes family bHLH transcription factor 6 ENSG00000144485 This gene encodes a member of a subfamily of basic helix-loop-helix transcription repressors that have homology to the Drosophila enhancer of split genes. Members of this gene family regulate cell differentiation in numerous cell types. The protein encoded by this gene functions as a cofactor, interacting with other transcription factors through a tetrapeptide domain in its C-terminus. Alternatively spliced transcript variants encoding different isoforms have been described. NA
SNPH 9751 syntaphilin ENSG00000101298 Syntaxin-1, synaptobrevin/VAMP, and SNAP25 interact to form the SNARE complex, which is required for synaptic vesicle docking and fusion. The protein encoded by this gene is membrane-associated and inhibits SNARE complex formation by binding free syntaxin-1. Expression of this gene appears to be brain-specific. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
NEURL1 9148 neuralized E3 ubiquitin protein ligase 1 ENSG00000107954 NA NA
AQP9 366 aquaporin 9 ENSG00000103569 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. NA
PRUNE2 158471 prune homolog 2 ENSG00000106772 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. NA
PGC 5225 progastricsin ENSG00000096088 This gene encodes an aspartic proteinase that belongs to the peptidase family A1. The encoded protein is a digestive enzyme that is produced in the stomach and constitutes a major component of the gastric mucosa. This protein is also secreted into the serum. This protein is synthesized as an inactive zymogen that includes a highly basic prosegment. This enzyme is converted into its active mature form at low pH by sequential cleavage of the prosegment that is carried out by the enzyme itself. Polymorphisms in this gene are associated with susceptibility to gastric cancers. Serum levels of this enzyme are used as a biomarker for certain gastric diseases including Helicobacter pylori related gastritis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 1. NA
PHKA1P1 ENSG00000232882 phosphorylase kinase, alpha 1 pseudogene 1 ENSG00000232882 NA NA
LIN7A 8825 lin-7 homolog A, crumbs cell polarity complex component ENSG00000111052 The protein encoded by this gene is involved in generating and maintaining the asymmetric distribution of channels and receptors at the cell membrane. The encoded protein also is required for the localization of some specific channels and can be part of a protein complex that couples synaptic vesicle exocytosis to cell adhesion in the brain. NA
PIANP 196500 PILR alpha associated neural protein ENSG00000139200 This gene encodes a ligand for the paired immunoglobin-like type 2 receptor alpha, and so may be involved in immune regulation. Alternate splicing results in multiple transcript variants encoding different proteins. NA
IGHA1 ENSG00000211895 immunoglobulin heavy constant alpha 1 ENSG00000211895 NA NA
GPX2 2877 glutathione peroxidase 2 ENSG00000176153 This gene is a member of the glutathione peroxidase family and encodes a selenium-dependent glutathione peroxidase that is one of two isoenzymes responsible for the majority of the glutathione-dependent hydrogen peroxide-reducing activity in the epithelium of the gastrointestinal tract. The protein encoded by this locus contains a selenocysteine (Sec) residue encoded by the UGA codon, which normally signals translation termination. Alternatively spliced transcript variants have been described. NA
IGHA2 ENSG00000211890 immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 NA NA
KCNMA1 3778 potassium calcium-activated channel subfamily M alpha 1 ENSG00000156113 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
NA NA NA ENSG00000156750 NA TRUE
PSCA 8000 prostate stem cell antigen ENSG00000167653 This gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney, and stomach. This gene is up-regulated in a large proportion of prostate cancers and is also detected in cancers of the bladder and pancreas. This gene includes a polymorphism that results in an upstream start codon in some individuals; this polymorphism is thought to be associated with a risk for certain gastric and bladder cancers. Alternative splicing results in multiple transcript variants. NA
MYL2 4633 myosin light chain 2 ENSG00000111245 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. NA
TMEM158 25907 transmembrane protein 158 (gene/pseudogene) ENSG00000249992 Constitutive activation of the Ras pathway triggers an irreversible proliferation arrest reminiscent of replicative senescence. Transcription of this gene is upregulated in response to activation of the Ras pathway, but not under other conditions that induce senescence. The encoded protein is similar to a rat cell surface receptor proposed to function in a neuronal survival pathway. An allelic polymorphism in this gene results in both functional and non-functional (frameshifted) alleles; the reference genome represents the functional allele. NA
RP11-244O19.1 ENSG00000261534 NA ENSG00000261534 NA NA
RGS9 8787 regulator of G-protein signaling 9 ENSG00000108370 This gene encodes a member of the RGS family of GTPase activating proteins that function in various signaling pathways by accelerating the deactivation of G proteins. This protein is anchored to photoreceptor membranes in retinal cells and deactivates G proteins in the rod and cone phototransduction cascades. Mutations in this gene result in bradyopsia. Multiple transcript variants encoding different isoforms have been found for this gene. NA
PITPNM3 83394 PITPNM family member 3 ENSG00000091622 This gene encodes a member of a family of membrane-associated phosphatidylinositol transfer domain-containing proteins. The calcium-binding protein has phosphatidylinositol (PI) transfer activity and interacts with the protein tyrosine kinase PTK2B (also known as PYK2). The protein is homologous to a Drosophila protein that is implicated in the visual transduction pathway in flies. Mutations in this gene result in autosomal dominant cone dystrophy. Multiple transcript variants encoding different isoforms have been found for this gene. NA
IGLL5 100423062 immunoglobulin lambda like polypeptide 5 ENSG00000254709 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. NA
PROM2 150696 prominin 2 ENSG00000155066 This gene encodes a member of the prominin family of pentaspan membrane glycoproteins. The encoded protein localizes to basal epithelial cells and may be involved in the organization of plasma membrane microdomains. Alternative splicing results in multiple transcript variants. NA
SFN 2810 stratifin ENSG00000175793 NA NA
TMEM59L 25789 transmembrane protein 59 like ENSG00000105696 This gene encodes a predicted type-I membrane glycoprotein. The encoded protein may play a role in functioning of the central nervous system. NA
PIGR 5284 polymeric immunoglobulin receptor ENSG00000162896 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. NA
MYL3 4634 myosin light chain 3 ENSG00000160808 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. NA
FUT2 2524 fucosyltransferase 2 ENSG00000176920 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. NA
MUC1 4582 mucin 1, cell surface associated ENSG00000185499 This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. NA
LRRC4B 94030 leucine rich repeat containing 4B ENSG00000131409 NA NA
NEO1 4756 neogenin 1 ENSG00000067141 This gene encodes a cell surface protein that is a member of the immunoglobulin superfamily. The encoded protein consists of four N-terminal immunoglobulin-like domains, six fibronectin type III domains, a transmembrane domain and a C-terminal internal domain that shares homology with the tumor suppressor candidate gene DCC. This protein may be involved in cell growth and differentiation and in cell-cell adhesion. Defects in this gene are associated with cell proliferation in certain cancers. Alternate splicing results in multiple transcript variants. NA
RAB25 57111 RAB25, member RAS oncogene family ENSG00000132698 The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. NA
LIPF 8513 lipase F, gastric type ENSG00000182333 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. NA
ATL1 51062 atlastin GTPase 1 ENSG00000198513 The protein encoded by this gene is a GTPase and a Golgi body transmembrane protein. The encoded protein can form a homotetramer and has been shown to interact with spastin and with mitogen-activated protein kinase kinase kinase kinase 4. This protein may be involved in axonal maintenance as evidenced by the fact that defects in this gene are a cause of spastic paraplegia type 3. Three transcript variants encoding two different isoforms have been found for this gene. NA
CTBP2 1488 C-terminal binding protein 2 ENSG00000175029 This gene produces alternative transcripts encoding two distinct proteins. One protein is a transcriptional repressor, while the other isoform is a major component of specialized synapses known as synaptic ribbons. Both proteins contain a NAD+ binding domain similar to NAD+-dependent 2-hydroxyacid dehydrogenases. A portion of the 3’ untranslated region was used to map this gene to chromosome 21q21.3; however, it was noted that similar loci elsewhere in the genome are likely. Blast analysis shows that this gene is present on chromosome 10. Several transcript variants encoding two different isoforms have been found for this gene. NA
ACSL1 2180 acyl-CoA synthetase long-chain family member 1 ENSG00000151726 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. NA
RP11-6O2.4 ENSG00000261054 NA ENSG00000261054 NA NA
CHST15 51363 carbohydrate (N-acetylgalactosamine 4-sulfate 6-O) sulfotransferase 15 ENSG00000182022 Chondroitin sulfate (CS) is a glycosaminoglycan which is an important structural component of the extracellular matrix and which links to proteins to form proteoglycans. Chondroitin sulfate E (CS-E) is an isomer of chondroitin sulfate in which the C-4 and C-6 hydroxyl groups are sulfated. This gene encodes a type II transmembrane glycoprotein that acts as a sulfotransferase to transfer sulfate to the C-6 hydroxal group of chondroitin sulfate. This gene has also been identified as being co-expressed with RAG1 in B-cells and as potentially acting as a B-cell surface signaling receptor. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
STX11 8676 syntaxin 11 ENSG00000135604 This gene encodes a member of the syntaxin family. Syntaxins have been implicated in the targeting and fusion of intracellular transport vesicles. This family member may regulate protein transport among late endosomes and the trans-Golgi network. Mutations in this gene have been associated with familial hemophagocytic lymphohistiocytosis. NA
NPPA 4878 natriuretic peptide A ENSG00000175206 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NA
PDZRN3 23024 PDZ domain containing ring finger 3 ENSG00000121440 This gene encodes a member of the LNX (Ligand of Numb Protein-X) family of RING-type ubiquitin E3 ligases. This protein may function in vascular morphogenesis and the differentiation of adipocytes, osteoblasts and myoblasts. This protein may be targeted for degradation by the human papilloma virus E6 protein. Alternative splicing results in multiple transcript variants. NA
PNLIPRP1 5407 pancreatic lipase related protein 1 ENSG00000187021 NA NA
GALNT12 79695 polypeptide N-acetylgalactosaminyltransferase 12 ENSG00000119514 This gene encodes a member of a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases, which catalyze the transfer of N-acetylgalactosamine (GalNAc) from UDP-GalNAc to a serine or threonine residue on a polypeptide acceptor in the initial step of O-linked protein glycosylation. Mutations in this gene are associated with an increased susceptibility to colorectal cancer. NA
SCOC-AS1 100129858 SCOC antisense RNA 1 ENSG00000196951 NA NA
NXN 64359 nucleoredoxin ENSG00000167693 This gene encodes a member of the thioredoxin superfamily, a group of small, multifunctional redox-active proteins. Members of this family are characterized by a conserved active motif called the thioredoxin fold that catalyzes disulfide bond formation and isomerization. The encoded protein acts a redox-dependent regulator of the Wnt signaling pathway and is involved in cell growth and differentiation. NA
APOC1 341 apolipoprotein C1 ENSG00000130208 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. NA
IPO7P2 ENSG00000225674 importin 7 pseudogene 2 ENSG00000225674 NA NA
IGLC1 ENSG00000211675 immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 NA NA
CIDEC 63924 cell death inducing DFFA like effector c ENSG00000187288 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. NA
LIPE 3991 lipase E, hormone sensitive type ENSG00000079435 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. NA
OLFM4 10562 olfactomedin 4 ENSG00000102837 This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. NA
CYP2J2 1573 cytochrome P450 family 2 subfamily J member 2 ENSG00000134716 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is thought to be the predominant enzyme responsible for epoxidation of endogenous arachidonic acid in cardiac tissue. Multiple transcript variants have been found for this gene. NA
MPP7 143098 membrane palmitoylated protein 7 ENSG00000150054 The protein encoded by this gene is a member of the p55 Stardust family of membrane-associated guanylate kinase (MAGUK) proteins, which function in the establishment of epithelial cell polarity. This family member forms a complex with the polarity protein DLG1 (discs, large homolog 1) and facilitates epithelial cell polarity and tight junction formation. Polymorphisms in this gene are associated with variations in site-specific bone mineral density (BMD). Alternative splicing results in multiple transcript variants. NA
RP11-265D17.2 ENSG00000254680 NA ENSG00000254680 NA NA
PLS1 5357 plastin 1 ENSG00000120756 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. The protein encoded by this gene is a third distinct plastin isoform, which is specifically expressed at high levels in the small intestine. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. A pseudogene of this gene is found on chromosome 11. NA
NA NA NA ENSG00000250606 NA TRUE
RHOU 58480 ras homolog family member U ENSG00000116574 This gene encodes a member of the Rho family of GTPases. This protein can activate PAK1 and JNK1, and can induce filopodium formation and stress fiber dissolution. It may also mediate the effects of WNT1 signaling in the regulation of cell morphology, cytoskeletal organization, and cell proliferation. A non-coding transcript variant of this gene results from naturally occurring read-through transcription between this locus and the neighboring DUSP5P (dual specificity phosphatase 5 pseudogene) locus. NA
C1QTNF3 114899 C1q and tumor necrosis factor related protein 3 ENSG00000082196 NA NA
SLC22A17 51310 solute carrier family 22 member 17 ENSG00000092096 NA NA
KIAA1522 57648 KIAA1522 ENSG00000162522 NA NA
AF001548.6 ENSG00000263065 NA ENSG00000263065 NA NA
RP11-304L19.4 ENSG00000261240 NA ENSG00000261240 NA NA
TMEM54 113452 transmembrane protein 54 ENSG00000121900 NA NA
SOBP 55084 sine oculis binding protein homolog ENSG00000112320 The protein encoded by this gene is a nuclear zinc finger protein that is involved in development of the cochlea. Defects in this gene have also been linked to intellectual disability. NA
S1PR1 1901 sphingosine-1-phosphate receptor 1 ENSG00000170989 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. NA
S100B 6285 S100 calcium binding protein B ENSG00000160307 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. NA
ZNF853 54753 zinc finger protein 853 ENSG00000236609 NA NA
PRRT2 112476 proline rich transmembrane protein 2 ENSG00000167371 This gene encodes a transmembrane protein containing a proline-rich domain in its N-terminal half. Studies in mice suggest that it is predominantly expressed in brain and spinal cord in embryonic and postnatal stages. Mutations in this gene are associated with episodic kinesigenic dyskinesia-1. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
NA NA NA ENSG00000225490 NA TRUE
DDR1 780 discoidin domain receptor tyrosine kinase 1 ENSG00000204580 Receptor tyrosine kinases play a key role in the communication of cells with their microenvironment. These kinases are involved in the regulation of cell growth, differentiation and metabolism. The protein encoded by this gene belongs to a subfamily of tyrosine kinase receptors with homology to Dictyostelium discoideum protein discoidin I in their extracellular domain, and that are activated by various types of collagen. Expression of this protein is restricted to epithelial cells, particularly in the kidney, lung, gastrointestinal tract, and brain. In addition, it has been shown to be significantly overexpressed in several human tumors. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
NUDT8 254552 nudix hydrolase 8 ENSG00000167799 NA NA
RP11-307B6.3 ENSG00000223774 NA ENSG00000223774 NA NA
BRSK1 84446 BR serine/threonine kinase 1 ENSG00000160469 NA NA
DBNDD1 79007 dysbindin (dystrobrevin binding protein 1) domain containing 1 ENSG00000003249 NA NA
RBP1 5947 retinol binding protein 1 ENSG00000114115 This gene encodes the carrier protein involved in the transport of retinol (vitamin A alcohol) from the liver storage site to peripheral tissue. Vitamin A is a fat-soluble vitamin necessary for growth, reproduction, differentiation of epithelial tissues, and vision. Multiple transcript variants encoding different isoforms have been found for this gene. NA
PRSS8 5652 protease, serine 8 ENSG00000052344 This gene encodes a member of the peptidase S1 or chymotrypsin family of serine proteases. The encoded preproprotein is proteolytically processed to generate light and heavy chains that associate via a disulfide bond to form the heterodimeric enzyme. This enzyme is highly expressed in prostate epithelia and is one of several proteolytic enzymes found in seminal fluid. This protease exhibits trypsin-like substrate specificity, cleaving protein substrates at the carboxyl terminus of lysine or arginine residues. The encoded protease partially mediates proteolytic activation of the epithelial sodium channel, a regulator of sodium balance, and may also play a role in epithelial barrier formation. NA
ENPP5 59084 ectonucleotide pyrophosphatase/phosphodiesterase 5 (putative) ENSG00000112796 This gene encodes a type-I transmembrane glycoprotein. Studies in rat suggest the encoded protein may play a role in neuronal cell communications. Alternatively spliced transcript variants have been described. NA
ACOT11 26027 acyl-CoA thioesterase 11 ENSG00000162390 This gene encodes a member of the acyl-CoA thioesterase family which catalyse the conversion of activated fatty acids to the corresponding non-esterified fatty acid and coenzyme A. Expression of a mouse homolog in brown adipose tissue is induced by low temperatures and repressed by warm temperatures. Higher levels of expression of the mouse homolog has been found in obesity-resistant mice compared with obesity-prone mice, suggesting a role of acyl-CoA thioesterase 11 in obesity. Alternative splicing results in transcript variants. NA
DVL1 1855 dishevelled segment polarity protein 1 ENSG00000107404 DVL1, the human homolog of the Drosophila dishevelled gene (dsh) encodes a cytoplasmic phosphoprotein that regulates cell proliferation, acting as a transducer molecule for developmental processes, including segmentation and neuroblast specification. DVL1 is a candidate gene for neuroblastomatous transformation. The Schwartz-Jampel syndrome and Charcot-Marie-Tooth disease type 2A have been mapped to the same region as DVL1. The phenotypes of these diseases may be consistent with defects which might be expected from aberrant expression of a DVL gene during development. NA
IFFO2 126917 intermediate filament family orphan 2 ENSG00000169991 NA NA
ATP9A 10079 ATPase phospholipid transporting 9A (putative) ENSG00000054793 NA NA
MYOZ2 51778 myozenin 2 ENSG00000172399 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. NA
GNAZ 2781 G protein subunit alpha z ENSG00000128266 The protein encoded by this gene is a member of a G protein subfamily that mediates signal transduction in pertussis toxin-insensitive systms. This encoded protein may play a role in maintaining the ionic balance of perilymphatic and endolymphatic cochlear fluids. NA
B2M 567 beta-2-microglobulin ENSG00000166710 This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. NA
RP11-11N9.4 ENSG00000247134 NA ENSG00000247134 NA NA
CAND2 23066 cullin associated and neddylation dissociated 2 (putative) ENSG00000144712 NA NA
ITPKA 3706 inositol-trisphosphate 3-kinase A ENSG00000137825 Regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of the inositol 1,4,5-trisphosphate 3-kinase is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. It is also a substrate for the cyclic AMP-dependent protein kinase, calcium/calmodulin- dependent protein kinase II, and protein kinase C in vitro. NA
CA9 768 carbonic anhydrase 9 ENSG00000107159 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA IX is a transmembrane protein and is one of only two tumor-associated carbonic anhydrase isoenzymes known. It is expressed in all clear-cell renal cell carcinoma, but is not detected in normal kidney or most other normal tissues. It may be involved in cell proliferation and transformation. This gene was mapped to 17q21.2 by fluorescence in situ hybridization, however, radiation hybrid mapping localized it to 9p13-p12. NA
RP11-561C5.4 ENSG00000229212 NA ENSG00000229212 NA NA
RP11-120K9.2 ENSG00000259684 NA ENSG00000259684 NA NA
MICAL2 9645 microtubule associated monooxygenase, calponin and LIM domain containing 2 ENSG00000133816 NA NA
PAK1 5058 p21 (RAC1) activated kinase 1 ENSG00000149269 This gene encodes a family member of serine/threonine p21-activating kinases, known as PAK proteins. These proteins are critical effectors that link RhoGTPases to cytoskeleton reorganization and nuclear signaling, and they serve as targets for the small GTP binding proteins Cdc42 and Rac. This specific family member regulates cell motility and morphology. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
NKD1 85407 naked cuticle homolog 1 ENSG00000140807 In the mouse, Nkd is a Dishevelled (see DVL1; MIM 601365)-binding protein that functions as a negative regulator of the Wnt (see WNT1; MIM 164820)-beta-catenin (see MIM 116806)-Tcf (see MIM 602272) signaling pathway. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
summary query name X_id symbol
This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. ENSG00000088882 carboxypeptidase X (M14 family), member 1 56265 CPXM1
NA ENSG00000263065 NA ENSG00000263065 AF001548.6
NA ENSG00000263335 NA ENSG00000263335 AF001548.5
NA ENSG00000101447 family with sequence similarity 83 member D 81610 FAM83D
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. ENSG00000175084 desmin 1674 DES
The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein is highly expressed in various brain regions and cardiac and skeletal muscle. It is specifically localized to the sarcoplasmic reticulum and nuclear membrane, and is involved in anchoring PKA to the nuclear membrane or sarcoplasmic reticulum. ENSG00000151320 A-kinase anchoring protein 6 9472 AKAP6
NA ENSG00000234638 NA ENSG00000234638 AC053503.6
NA ENSG00000130176 calponin 1 1264 CNN1
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000133392 myosin, heavy chain 11, smooth muscle 4629 MYH11
This gene encodes a member of the ankyrin repeat and SOCS box-containing (ASB) protein family. These proteins play a role in protein degradation by coupling suppressor of cytokine signalling (SOCS) proteins with the elongin BC complex. The encoded protein is a subunit of a multimeric E3 ubiquitin ligase complex that mediates the degradation of actin-binding proteins. This gene plays a role in retinoic acid-induced growth inhibition and differentiation of myeloid leukemia cells. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000100628 ankyrin repeat and SOCS box containing 2 51676 ASB2
Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. ENSG00000065320 netrin 1 9423 NTN1
This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. ENSG00000170558 cadherin 2 1000 CDH2
This gene encodes a member of the alpha/beta hydrolase superfamily. It is imprinted, exhibiting preferential expression from the paternal allele in fetal tissues, and isoform-specific imprinting in lymphocytes. The loss of imprinting of this gene has been linked to certain types of cancer and may be due to promotor switching. The encoded protein may play a role in development. Alternatively spliced transcript variants encoding multiple isoforms have been identified for this gene. Pseudogenes of this gene are located on the short arm of chromosomes 3 and 4, and the long arm of chromosomes 6 and 15. ENSG00000106484 mesoderm specific transcript 4232 MEST
NA ENSG00000235910 APOA1 antisense RNA 104326055 APOA1-AS
This gene encodes a protein that is a member of the dickkopf family. It is a secreted protein with two cysteine rich regions and is involved in embryonic development through its inhibition of the WNT signaling pathway. Elevated levels of DKK1 in bone marrow plasma and peripheral blood is associated with the presence of osteolytic bone lesions in patients with multiple myeloma. ENSG00000107984 dickkopf WNT signaling pathway inhibitor 1 22943 DKK1
This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. ENSG00000118137 apolipoprotein A1 335 APOA1
NA ENSG00000174807 CD248 molecule 57124 CD248
NA ENSG00000267060 prostaglandin E synthase 3 (cytosolic)-like 100885848 PTGES3L
NA ENSG00000197380 dishevelled binding antagonist of beta catenin 3 147906 DACT3
NA ENSG00000261054 NA ENSG00000261054 RP11-6O2.4
Cell adhesion molecules (CAMs) are members of the immunoglobulin superfamily. This gene encodes a neuronal cell adhesion molecule with multiple immunoglobulin-like C2-type domains and fibronectin type-III domains. This ankyrin-binding protein is involved in neuron-neuron adhesion and promotes directional signaling during axonal cone growth. This gene is also expressed in non-neural tissues and may play a general role in cell-cell communication via signaling from its intracellular domain to the actin cytoskeleton during directional cell migration. Allelic variants of this gene have been associated with autism and addiction vulnerability. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000091129 neuronal cell adhesion molecule 4897 NRCAM
The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. ENSG00000166819 perilipin 1 5346 PLIN1
This gene encodes a protein that binds the cancer-testis antigen Synovial Sarcoma X breakpoint 2 protein. The encoded protein may regulate the activity of Synovial Sarcoma X breakpoint 2 protein in malignant cells. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 3. ENSG00000117155 SSX family member 2 interacting protein 117178 SSX2IP
This gene encodes a member of the Snail family of C2H2-type zinc finger transcription factors. The encoded protein acts as a transcriptional repressor that binds to E-box motifs and is also likely to repress E-cadherin transcription in breast carcinoma. This protein is involved in epithelial-mesenchymal transitions and has antiapoptotic activity. Mutations in this gene may be associated with sporatic cases of neural tube defects. ENSG00000019549 snail family transcriptional repressor 2 6591 SNAI2
NA ENSG00000182118 family with sequence similarity 89 member A 375061 FAM89A
The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD (myomesin 1) and 165 kD (myomesin 2). This protein, myomesin 1, like myomesin 2, titin, and other myofibrillar proteins contains structural modules with strong homology to either fibronectin type III (motif I) or immunoglobulin C2 (motif II) domains. Myomesin 1 and myomesin 2 each have a unique N-terminal region followed by 12 modules of motif I or motif II, in the arrangement II-II-I-I-I-I-I-II-II-II-II-II. The two proteins share 50% sequence identity in this repeat-containing region. The head structure formed by these 2 proteins on one end of the titin string extends into the center of the M band. The integrating structure of the sarcomere arises from muscle-specific members of the superfamily of immunoglobulin-like proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000101605 myomesin 1 8736 MYOM1
The protein encoded by this gene is a GTPase which belongs to the RAS superfamily of small GTP-binding proteins. Members of this superfamily appear to regulate a diverse array of cellular events, including the control of cell growth, cytoskeletal reorganization, and the activation of protein kinases. Alternative splicing results in multiple transcript variants. ENSG00000169750 ras-related C3 botulinum toxin substrate 3 (rho family, small GTP binding protein Rac3) 5881 RAC3
NA ENSG00000231346 long intergenic non-protein coding RNA 1160 ENSG00000231346 LINC01160
NA ENSG00000239523 MYLK antisense RNA 1 100506826 MYLK-AS1
NA ENSG00000254756 NA ENSG00000254756 RP11-867G23.12
NA ENSG00000166831 RNA binding protein with multiple splicing 2 348093 RBPMS2
Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ENSG00000163017 actin, gamma 2, smooth muscle, enteric 72 ACTG2
This gene encodes a member of the bombesin-like family of neuropeptides, which negatively regulate eating behavior. The encoded protein may regulate colonic smooth muscle contraction through binding to its cognate receptor, the neuromedin B receptor (NMBR). Polymorphisms of this gene may be associated with hunger, weight gain and obesity. Alternative splicing results in multiple transcript variants. ENSG00000197696 neuromedin B 4828 NMB
NA ENSG00000123689 G0/G1 switch 2 50486 G0S2
Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. ENSG00000135269 testin LIM domain protein 26136 TES
FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. ENSG00000170323 fatty acid binding protein 4 2167 FABP4
This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. ENSG00000173432 serum amyloid A1 6288 SAA1
NA ENSG00000124251 TP53 target 5 27296 TP53TG5
This gene encodes a mitogen-responsive phosphoprotein. It is expressed in normal ovarian epithelial cells, but is down-regulated or absent from ovarian carcinoma cell lines, suggesting its role as a tumor suppressor. This protein binds to the SH3 domains of GRB2, an adaptor protein that couples tyrosine kinase receptors to SOS (a guanine nucleotide exchange factor for Ras), via its C-terminal proline-rich sequences, and may thus modulate growth factor/Ras pathways by competing with SOS for binding to GRB2. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000153071 DAB2, clathrin adaptor protein 1601 DAB2
NA ENSG00000253250 chromosome 8 open reading frame 88 100127983 C8orf88
NA ENSG00000174791 Ras and Rab interactor 1 9610 RIN1
NA ENSG00000078804 tumor protein p53 inducible nuclear protein 2 58476 TP53INP2
This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. ENSG00000065534 myosin light chain kinase 4638 MYLK
NA ENSG00000245293 uncharacterized LOC101929595 101929595 LOC101929595
This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. ENSG00000106976 dynamin 1 1759 DNM1
Voltage-gated sodium channels are heteromeric proteins that function in the generation and propagation of action potentials in muscle and neuronal cells. They are composed of one alpha and two beta subunits, where the alpha subunit provides channel activity and the beta-1 subunit modulates the kinetics of channel inactivation. This gene encodes a sodium channel beta-1 subunit. Mutations in this gene result in generalized epilepsy with febrile seizures plus, Brugada syndrome 5, and defects in cardiac conduction. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000105711 sodium voltage-gated channel beta subunit 1 6324 SCN1B
The protein encoded by this gene is a receptor for activated protein C, a serine protease activated by and involved in the blood coagulation pathway. The encoded protein is an N-glycosylated type I membrane protein that enhances the activation of protein C. Mutations in this gene have been associated with venous thromboembolism and myocardial infarction, as well as with late fetal loss during pregnancy. The encoded protein may also play a role in malarial infection and has been associated with cancer. ENSG00000101000 protein C receptor 10544 PROCR
This gene encodes a component of a conserved striatin-interacting phosphatase and kinase complex. Striatin family complexes participate in a variety of cellular processes including signaling, cell cycle control, cell migration, Golgi assembly, and apoptosis. The protein encoded by this gene is a coiled-coil, tail-anchored membrane protein with a single C-terminal transmembrane domain that is posttranslationally inserted into membranes. Mutations in this gene are associated with Brugada syndrome, a cardiac channelopathy. Alternative splicing results in multiple transcript variants. ENSG00000163681 sarcolemma associated protein 7871 SLMAP
Germinal center kinases (GCKs), such as TNIK, are characterized by an N-terminal kinase domain and a C-terminal GCK domain that serves a regulatory function (Fu et al., 1999 [PubMed 10521462]). ENSG00000154310 TRAF2 and NCK interacting kinase 23043 TNIK
NA ENSG00000227591 uncharacterized LOC101930114 101930114 LOC101930114
This gene encodes a member of the synuclein family of proteins which are believed to be involved in the pathogenesis of neurodegenerative diseases. Mutations in this gene have also been associated with breast tumor development. ENSG00000173267 synuclein gamma 6623 SNCG
This gene was identified by gene expression studies in patients with acute myeloid leukemia (AML). The gene is conserved among mammals and is not found in lower organisms. Tissues that express this gene develop from the neuroectoderm. Multiple alternatively spliced transcript variants that encode different proteins have been described for this gene; however, some of the transcript variants are found only in AML cell lines. ENSG00000164929 brain and acute leukemia, cytoplasmic 79870 BAALC
This gene encodes a member of the F-box protein family. This F-box protein interacts with S-phase kinase-associated protein 1A and cullin in order to form SCF complexes which function as ubiquitin ligases. ENSG00000197361 F-box and leucine rich repeat protein 22 283807 FBXL22
This gene encodes a protein that belongs to the glutamate-gated ionic channel family. Glutamate functions as the major excitatory neurotransmitter in the central nervous system through activation of ligand-gated ion channels and G protein-coupled membrane receptors. The protein encoded by this gene forms functional heteromeric kainate-preferring ionic channels with the subunits encoded by related gene family members. Alternative splicing results in multiple transcript variants. ENSG00000105737 glutamate ionotropic receptor kainate type subunit 5 2901 GRIK5
The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. ENSG00000182253 synemin 23336 SYNM
This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. The secreted, netrin domain-containing protein encoded by this gene is involved in regulation of platelet aggregation and recruitment and may play role in hormonal regulation and endometrial tissue remodeling. ENSG00000157150 TIMP metallopeptidase inhibitor 4 7079 TIMP4
NA ENSG00000178803 ADORA2A antisense RNA 1 646023 ADORA2A-AS1
NA ENSG00000230082 PRRT3 antisense RNA 1 100874032 PRRT3-AS1
The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. ENSG00000143387 cathepsin K 1513 CTSK
NA ENSG00000111696 5’-nucleotidase domain containing 3 51559 NT5DC3
NA ENSG00000185437 SH3 domain binding glutamate rich protein 6450 SH3BGR
This gene encodes a member of the muscle segment homeobox gene family. The encoded protein functions as a transcriptional repressor during embryogenesis through interactions with components of the core transcription complex and other homeoproteins. It may also have roles in limb-pattern formation, craniofacial development, particularly odontogenesis, and tumor growth inhibition. Mutations in this gene, which was once known as homeobox 7, have been associated with nonsyndromic cleft lip with or without cleft palate 5, Witkop syndrome, Wolf-Hirschom syndrome, and autosomoal dominant hypodontia. ENSG00000163132 msh homeobox 1 4487 MSX1
NA ENSG00000138028 cell growth regulator with EF-hand domain 1 10669 CGREF1
This gene encodes a protein with significant sequence similarity to the ligand binding domain of platelet-derived growth factor receptor beta. Mutations in this gene, or deletion of a chromosomal segment containing this gene, are associated with sporadic hepatocellular carcinomas, colorectal cancers, and non-small cell lung cancers. This suggests this gene product may function as a tumor suppressor. ENSG00000104213 platelet derived growth factor receptor like 5157 PDGFRL
Glycerophosphodiester phosphodiesterases (GDPDs; EC 3.1.4.46), such as GDPD5, are involved in glycerol metabolism (Lang et al., 2008 [PubMed 17578682]). ENSG00000158555 glycerophosphodiester phosphodiesterase domain containing 5 81544 GDPD5
NA ENSG00000166823 mesoderm posterior bHLH transcription factor 1 55897 MESP1
NA ENSG00000230289 NA ENSG00000230289 RP11-334J6.6
This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. ENSG00000187288 cell death inducing DFFA like effector c 63924 CIDEC
The gene is part of a 3-member transmembrane receptor kinase receptor family with a processed pseudogene distal on chromosome 15. The encoded protein is activated by the products of the growth arrest-specific gene 6 and protein S genes and is involved in controlling cell survival and proliferation, spermatogenesis, immunoregulation and phagocytosis. The encoded protein has also been identified as a cell entry factor for Ebola and Marburg viruses. ENSG00000092445 TYRO3 protein tyrosine kinase 7301 TYRO3
This gene is a member of the guanine nucleotide-binding protein (G protein) gamma family and encodes a lipid-anchored, cell membrane protein. As a member of the heterotrimeric G protein complex, this protein plays a role in this transmembrane signaling system. This protein is also subject to carboxyl-terminal processing. Decreased expression of this gene is associated with splenic marginal zone lymphomas. ENSG00000127920 G protein subunit gamma 11 2791 GNG11
NA ENSG00000188931 cilia and flagella associated protein 126 257177 CFAP126
The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein and is similar in sequence to its family member CD53 antigen. It is known to complex with integrins and other transmembrane 4 superfamily proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000214063 tetraspanin 4 7106 TSPAN4
The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. ENSG00000079435 lipase E, hormone sensitive type 3991 LIPE
This gene encodes a protein which is a member of the formin/diaphanous family of proteins. The gene is ubiquitously expressed but is found in abundance in the spleen. The encoded protein has sequence homology to diaphanous and formin proteins within the Formin Homology (FH)1 and FH2 domains. It also contains a coiled-coil domain, a collagen-like domain, two nuclear localization signals, and several potential PKC and PKA phosphorylation sites. It is a predominantly cytoplasmic protein and is expressed in a variety of human cell lines. Alternative splicing results in multiple transcript variants. ENSG00000135723 formin homology 2 domain containing 1 29109 FHOD1
This locus encodes a protein that may play a role in the cellular response to arterial injury through involvement in vascular remodeling. Mutations at this locus have been associated with Barrett esophagus and esophageal adenocarcinoma. Alternatively spliced transcript variants have been described. ENSG00000164932 collagen triple helix repeat containing 1 115908 CTHRC1
The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. ENSG00000169710 fatty acid synthase 2194 FASN
This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000171680 pleckstrin homology and RhoGEF domain containing G5 57449 PLEKHG5
Myosin phosphatase is a protein complex comprised of three subunits: a catalytic subunit (PP1c-delta, protein phosphatase 1, catalytic subunit delta), a large regulatory subunit (MYPT, myosin phosphatase target) and small regulatory subunit (sm-M20). Two isoforms of MYPT have been isolated–MYPT1 and MYPT2, the first of which is widely expressed, and the second of which may be specific to heart, skeletal muscle, and brain. Each of the MYPT isoforms functions to bind PP1c-delta and increase phosphatase activity. This locus encodes both MYTP2 and M20. Alternatively spliced transcript variants encoding different isoforms have been identified. Related pseudogenes have been defined on the Y chromosome. ENSG00000077157 protein phosphatase 1 regulatory subunit 12B 4660 PPP1R12B
Fibrillar collagen types I-III are synthesized as precursor molecules known as procollagens. These precursors contain amino- and carboxyl-terminal peptide extensions known as N- and C-propeptides, respectively, which are cleaved, upon secretion of procollagen from the cell, to yield the mature triple helical, highly structured fibrils. This gene encodes a glycoprotein which binds and drives the enzymatic cleavage of type I procollagen and heightens C-proteinase activity. ENSG00000106333 procollagen C-endopeptidase enhancer 5118 PCOLCE
NA ENSG00000126803 heat shock protein family A (Hsp70) member 2 3306 HSPA2
The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. ENSG00000198523 phospholamban 5350 PLN
This gene encodes a member of the HOMER family of postsynaptic density scaffolding proteins that share a similar domain structure consisting of an N-terminal Enabled/vasodilator-stimulated phosphoprotein homology 1 domain which mediates protein-protein interactions, and a carboxy-terminal coiled-coil domain and two leucine zipper motifs that are involved in self-oligomerization. The encoded protein binds numerous other proteins including group I metabotropic glutamate receptors, inositol 1,4,5-trisphosphate receptors and amyloid precursor proteins and has been implicated in diverse biological functions such as neuronal signaling, T-cell activation and trafficking of amyloid beta peptides. Alternative splicing results in multiple transcript variants. ENSG00000051128 homer scaffolding protein 3 9454 HOMER3
This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000128591 filamin C 2318 FLNC
NA ENSG00000229894 NA ENSG00000229894 RP11-668G10.2
NA ENSG00000163806 speedy/RINGO cell cycle regulator family member A 245711 SPDYA
The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000132329 receptor activity modifying protein 1 10267 RAMP1
NA ENSG00000091986 coiled-coil domain containing 80 151887 CCDC80
The protein encoded by this gene plays a direct regulatory role in calcium-ion-dependent exocytosis in both endocrine and exocrine cells and plays a key role in insulin secretion by pancreatic cells. This gene is likely a tumor suppressor. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000181031 rabphilin 3A-like (without C2 domains) 9501 RPH3AL
The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. ENSG00000120875 dual specificity phosphatase 4 1846 DUSP4
This gene encodes a member of a small family of secreted growth factors that binds heparin and responds to retinoic acid. The encoded protein promotes cell growth, migration, and angiogenesis, in particular during tumorigenesis. This gene has been targeted as a therapeutic for a variety of different disorders. Alternatively spliced transcript variants encoding multiple isoforms have been observed. ENSG00000110492 midkine (neurite growth-promoting factor 2) 4192 MDK
This gene encodes a member of the adenylate kinase family of enzymes. The encoded protein is localized to the mitochondrial matrix. Adenylate kinases regulate the adenine and guanine nucleotide compositions within a cell by catalyzing the reversible transfer of phosphate group among these nucleotides. Five isozymes of adenylate kinase have been identified in vertebrates. Expression of these isozymes is tissue-specific and developmentally regulated. A pseudogene for this gene has been located on chromosome 17. Three transcript variants encoding the same protein have been identified for this gene. Sequence alignment suggests that the gene defined by NM_013410, NM_203464, and NM_001005353 is located on chromosome 1. ENSG00000162433 adenylate kinase 4 205 AK4
NA ENSG00000176909 MEF2 activating motif and SAP domain containing transcriptional regulator 284358 MAMSTR
This gene encodes a member of the peroxisome proliferator-activated receptor (PPAR) subfamily of nuclear receptors. PPARs form heterodimers with retinoid X receptors (RXRs) and these heterodimers regulate transcription of various genes. Three subtypes of PPARs are known: PPAR-alpha, PPAR-delta, and PPAR-gamma. The protein encoded by this gene is PPAR-gamma and is a regulator of adipocyte differentiation. Additionally, PPAR-gamma has been implicated in the pathology of numerous diseases including obesity, diabetes, atherosclerosis and cancer. Alternatively spliced transcript variants that encode different isoforms have been described. ENSG00000132170 peroxisome proliferator activated receptor gamma 5468 PPARG
The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to type I and type II regulatory subunits of PKA and anchors them to the mitochondrion. This protein is speculated to be involved in the cAMP-dependent signal transduction pathway and in directing RNA to a specific cellular compartment. ENSG00000121057 A-kinase anchoring protein 1 8165 AKAP1
NA ENSG00000231050 NA ENSG00000231050 RP1-140A9.1
This gene encodes a member of the POP family of proteins containing three putative transmembrane domains. This gene is expressed in cardiac and skeletal muscle and may play an important role in development of these tissues. The mouse ortholog may be involved in the regeneration of adult skeletal muscle and may act as a cell adhesion molecule in coronary vasculogenesis. Three transcript variants encoding the same protein have been found for this gene. ENSG00000112276 blood vessel epicardial substance 11149 BVES
The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP contains an extracellular region, a single transmembrane segment and two tandem intracytoplasmic catalytic domains, and thus represents a receptor-type PTP. The extracellular region of this protein is composed of multiple Ig-like and fibronectin type III-like domains. Studies of the similar gene in mice suggested that this PTP may be involved in cell-cell interaction, primary axonogenesis, and axon guidance during embryogenesis. This PTP has been also implicated in the molecular control of adult nerve repair. Four alternatively spliced transcript variants, which encode distinct proteins, have been reported. ENSG00000105426 protein tyrosine phosphatase, receptor type S 5802 PTPRS
This gene encodes an adaptor protein that is composed of two protein-protein interaction domains: a N-terminal PYRIN-PAAD-DAPIN domain (PYD) and a C-terminal caspase-recruitment domain (CARD). The PYD and CARD domains are members of the six-helix bundle death domain-fold superfamily that mediates assembly of large signaling complexes in the inflammatory and apoptotic signaling pathways via the activation of caspase. In normal cells, this protein is localized to the cytoplasm; however, in cells undergoing apoptosis, it forms ball-like aggregates near the nuclear periphery. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000103490 PYD and CARD domain containing 29108 PYCARD
NA ENSG00000167291 TBC1 domain family member 16 125058 TBC1D16
NA ENSG00000243829 NA ENSG00000243829 CTB-33G10.1
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol name query summary notfound
6289 SAA2 serum amyloid A2 ENSG00000134339 NA NA
100528017 SAA2-SAA4 SAA2-SAA4 readthrough ENSG00000255071 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. NA
286 ANK1 ankyrin 1 ENSG00000029534 Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Multiple isoforms of ankyrin with different affinities for various target proteins are expressed in a tissue-specific, developmentally regulated manner. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. Ankyrin 1, the prototype of this family, was first discovered in the erythrocytes, but since has also been found in brain and muscles. Mutations in erythrocytic ankyrin 1 have been associated in approximately half of all patients with hereditary spherocytosis. Complex patterns of alternative splicing in the regulatory domain, giving rise to different isoforms of ankyrin 1 have been described. Truncated muscle-specific isoforms of ankyrin 1 resulting from usage of an alternate promoter have also been identified. NA
7162 TPBG trophoblast glycoprotein ENSG00000146242 This gene encodes a leucine-rich transmembrane glycoprotein that may be involved in cell adhesion. The encoded protein is an oncofetal antigen that is specific to trophoblast cells. In adults this protein is highly expressed in many tumor cells and is associated with poor clinical outcome in numerous cancers. Alternate splicing in the 5’ UTR results in multiple transcript variants that encode the same protein. NA
2938 GSTA1 glutathione S-transferase alpha 1 ENSG00000243955 This gene encodes a member of a family of enzymes that function to add glutathione to target electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins, and products of oxidative stress. This action is an important step in detoxification of these compounds. This subfamily of enzymes has a particular role in protecting cells from reactive oxygen species and the products of peroxidation. Polymorphisms in this gene influence the ability of individuals to metabolize different drugs. This gene is located in a cluster of similar genes and pseudogenes on chromosome 6. Alternative splicing results in multiple transcript variants. NA
6288 SAA1 serum amyloid A1 ENSG00000173432 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. NA
84617 TUBB6 tubulin beta 6 class V ENSG00000176014 NA NA
ENSG00000237886 NALT1 NOTCH1 associated lncRNA in T-cell acute lymphoblastic leukemia 1 ENSG00000237886 NA NA
7448 VTN vitronectin ENSG00000109072 The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. NA
ENSG00000270670 RP11-248C1.3 NA ENSG00000270670 NA NA
ENSG00000242198 CTD-2235C13.1 NA ENSG00000242198 NA NA
ENSG00000251196 RP11-54F2.1 NA ENSG00000251196 NA NA
23413 NCS1 neuronal calcium sensor 1 ENSG00000107130 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000233593 RP4-665J23.1 NA ENSG00000233593 NA NA
ENSG00000213830 CFL1P5 cofilin 1 (non-muscle) pseudogene 5 ENSG00000213830 NA NA
ENSG00000240395 RPL5P23 ribosomal protein L5 pseudogene 23 ENSG00000240395 NA NA
4072 EPCAM epithelial cell adhesion molecule ENSG00000119888 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. NA
3778 KCNMA1 potassium calcium-activated channel subfamily M alpha 1 ENSG00000156113 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
ENSG00000253364 RP11-731F5.2 NA ENSG00000253364 NA NA
2035 EPB41 erythrocyte membrane protein band 4.1 ENSG00000159023 The protein encoded by this gene, together with spectrin and actin, constitute the red cell membrane cytoskeletal network. This complex plays a critical role in erythrocyte shape and deformability. Mutations in this gene are associated with type 1 elliptocytosis (EL1). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
NA NA NA ENSG00000180672 NA TRUE
845 CASQ2 calsequestrin 2 ENSG00000118729 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. NA
336 APOA2 apolipoprotein A2 ENSG00000158874 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. NA
113452 TMEM54 transmembrane protein 54 ENSG00000121900 NA NA
ENSG00000259657 PIGHP1 phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 ENSG00000259657 NA NA
ENSG00000234638 AC053503.6 NA ENSG00000234638 NA NA
63924 CIDEC cell death inducing DFFA like effector c ENSG00000187288 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. NA
ENSG00000255139 AP000442.1 NA ENSG00000255139 NA NA
857 CAV1 caveolin 1 ENSG00000105974 The scaffolding protein encoded by this gene is the main component of the caveolae plasma membranes found in most cell types. The protein links integrin subunits to the tyrosine kinase FYN, an initiating step in coupling integrins to the Ras-ERK pathway and promoting cell cycle progression. The gene is a tumor suppressor gene candidate and a negative regulator of the Ras-p42/44 mitogen-activated kinase cascade. Caveolin 1 and caveolin 2 are located next to each other on chromosome 7 and express colocalizing proteins that form a stable hetero-oligomeric complex. Mutations in this gene have been associated with Berardinelli-Seip congenital lipodystrophy. Alternatively spliced transcripts encode alpha and beta isoforms of caveolin 1. NA
643529 LINC00865 long intergenic non-protein coding RNA 865 ENSG00000232229 NA NA
ENSG00000224818 RP11-134G8.10 NA ENSG00000224818 NA NA
6799 SULT1A2 sulfotransferase family 1A member 2 ENSG00000197165 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene encodes one of two phenol sulfotransferases with thermostable enzyme activity. Two alternatively spliced variants that encode the same protein have been described. NA
694 BTG1 B-cell translocation gene 1, anti-proliferative ENSG00000133639 This gene is a member of an anti-proliferative gene family that regulates cell growth and differentiation. Expression of this gene is highest in the G0/G1 phases of the cell cycle and downregulated when cells progressed through G1. The encoded protein interacts with several nuclear receptors, and functions as a coactivator of cell differentiation. This locus has been shown to be involved in a t(8;12)(q24;q22) chromosomal translocation in a case of B-cell chronic lymphocytic leukemia. NA
388121 TNFAIP8L3 TNF alpha induced protein 8 like 3 ENSG00000183578 NA NA
25819 NOCT nocturnin ENSG00000151014 The protein encoded by this gene is highly similar to Nocturnin, a gene identified as a circadian clock regulated gene in Xenopus laevis. This protein and Nocturnin protein share similarity with the C-terminal domain of a yeast transcription factor, carbon catabolite repression 4 (CCR4). The mRNA abundance of a similar gene in mouse has been shown to exhibit circadian rhythmicity, which suggests a role for this protein in clock function or as a circadian clock effector. NA
NA NA NA ENSG00000261337 NA TRUE
64080 RBKS ribokinase ENSG00000171174 This gene encodes a member of the carbohydrate kinase PfkB family. The encoded protein phosphorylates ribose to form ribose-5-phosphate in the presence of ATP and magnesium as a first step in ribose metabolism. Alternative splicing results in multiple transcript variants. NA
ENSG00000236234 AC091132.1 NA ENSG00000236234 NA NA
ENSG00000236213 AC006369.2 NA ENSG00000236213 NA NA
402778 IFITM10 interferon induced transmembrane protein 10 ENSG00000244242 NA NA
3004 GZMM granzyme M ENSG00000197540 Human natural killer (NK) cells and activated lymphocytes express and store a distinct subset of neutral serine proteases together with proteoglycans and other immune effector molecules in large cytoplasmic granules. These serine proteases are collectively termed granzymes and include 4 distinct gene products: granzyme A, granzyme B, granzyme H, and the protein encoded by this gene, granzyme M. Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000219085 NPM1P37 nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 37 ENSG00000219085 NA NA
1674 DES desmin ENSG00000175084 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. NA
2167 FABP4 fatty acid binding protein 4 ENSG00000170323 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. NA
ENSG00000231346 LINC01160 long intergenic non-protein coding RNA 1160 ENSG00000231346 NA NA
ENSG00000261136 RP11-37C7.3 NA ENSG00000261136 NA NA
ENSG00000239218 RPS20P22 ribosomal protein S20 pseudogene 22 ENSG00000239218 NA NA
84935 MEDAG mesenteric estrogen dependent adipogenesis ENSG00000102802 NA NA
152573 SHISA3 shisa family member 3 ENSG00000178343 NA NA
ENSG00000239884 RN7SL608P RNA, 7SL, cytoplasmic 608, pseudogene ENSG00000239884 NA NA
57460 PPM1H protein phosphatase, Mg2+/Mn2+ dependent 1H ENSG00000111110 NA NA
29091 STXBP6 syntaxin binding protein 6 ENSG00000168952 STXBP6 binds components of the SNARE complex (see MIM 603215) and may be involved in regulating SNARE complex formation (Scales et al., 2002 [PubMed 12145319]). NA
ENSG00000229512 AC068580.5 NA ENSG00000229512 NA NA
3638 INSIG1 insulin induced gene 1 ENSG00000186480 Oxysterols regulate cholesterol homeostasis through the liver X receptor (LXR)- and sterol regulatory element-binding protein (SREBP)-mediated signaling pathways. This gene is an insulin-induced gene. It encodes an endoplasmic reticulum (ER) membrane protein that plays a critical role in regulating cholesterol concentrations in cells. This protein binds to the sterol-sensing domains of SREBP cleavage-activating protein (SCAP) and HMG CoA reductase, and is essential for the sterol-mediated trafficking of the two proteins. Alternatively spliced transcript variants encoding distinct isoforms have been observed. NA
94 ACVRL1 activin A receptor like type 1 ENSG00000139567 This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. NA
ENSG00000234329 RP11-767N6.2 NA ENSG00000234329 NA NA
9609 RAB36 RAB36, member RAS oncogene family ENSG00000100228 NA NA
214 ALCAM activated leukocyte cell adhesion molecule ENSG00000170017 This gene encodes activated leukocyte cell adhesion molecule (ALCAM), also known as CD166 (cluster of differentiation 166), which is a member of a subfamily of immunoglobulin receptors with five immunoglobulin-like domains (VVC2C2C2) in the extracellular domain. This protein binds to T-cell differentiation antigene CD6, and is implicated in the processes of cell adhesion and migration. Multiple alternatively spliced transcript variants encoding different isoforms have been found. NA
56963 RGMA repulsive guidance molecule family member a ENSG00000182175 This gene encodes a member of the repulsive guidance molecule family. The encoded protein is a glycosylphosphatidylinositol-anchored glycoprotein that functions as an axon guidance protein in the developing and adult central nervous system. This protein may also function as a tumor suppressor in some cancers. Alternate splicing results in multiple transcript variants. NA
ENSG00000243708 PLA2G4B phospholipase A2 group IVB ENSG00000243708 NA NA
4625 MYH7 myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. NA
5617 PRL prolactin ENSG00000172179 This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. NA
8630 HSD17B6 hydroxysteroid 17-beta dehydrogenase 6 ENSG00000025423 The protein encoded by this gene has both oxidoreductase and epimerase activities and is involved in androgen catabolism. The oxidoreductase activity can convert 3 alpha-adiol to dihydrotestosterone, while the epimerase activity can convert androsterone to epi-androsterone. Both reactions use NAD+ as the preferred cofactor. This gene is a member of the retinol dehydrogenase family. NA
ENSG00000266498 RP11-45M22.5 NA ENSG00000266498 NA NA
57007 ACKR3 atypical chemokine receptor 3 ENSG00000144476 This gene encodes a member of the G-protein coupled receptor family. Although this protein was earlier thought to be a receptor for vasoactive intestinal peptide (VIP), it is now considered to be an orphan receptor, in that its endogenous ligand has not been identified. The protein is also a coreceptor for human immunodeficiency viruses (HIV). Translocations involving this gene and HMGA2 on chromosome 12 have been observed in lipomas. NA
122970 ACOT4 acyl-CoA thioesterase 4 ENSG00000177465 NA NA
5178 PEG3 paternally expressed 3 ENSG00000198300 In human, ZIM2 and PEG3 are treated as two distinct genes though they share multiple 5’ exons and a common promoter and both genes are paternally expressed (PMID:15203203). Alternative splicing events connect their shared 5’ exons either with the remaining 4 exons unique to ZIM2, or with the remaining 2 exons unique to PEG3. In contrast, in other mammals ZIM2 does not undergo imprinting and, in mouse, cow, and likely other mammals as well, the ZIM2 and PEG3 genes do not share exons. Human PEG3 protein belongs to the Kruppel C2H2-type zinc finger protein family. PEG3 may play a role in cell proliferation and p53-mediated apoptosis. PEG3 has also shown tumor suppressor activity and tumorigenesis in glioma and ovarian cells. Alternative splicing of this PEG3 gene results in multiple transcript variants encoding distinct isoforms. NA
ENSG00000261759 RP11-626G11.3 NA ENSG00000261759 NA NA
3101 HK3 hexokinase 3 ENSG00000160883 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. NA
57732 ZFYVE28 zinc finger FYVE-type containing 28 ENSG00000159733 NA NA
9244 CRLF1 cytokine receptor like factor 1 ENSG00000006016 This gene encodes a member of the cytokine type I receptor family. The protein forms a secreted complex with cardiotrophin-like cytokine factor 1 and acts on cells expressing ciliary neurotrophic factor receptors. The complex can promote survival of neuronal cells. Mutations in this gene result in Crisponi syndrome and cold-induced sweating syndrome. NA
26136 TES testin LIM domain protein ENSG00000135269 Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. NA
23615 PYY2 peptide YY, 2 (pseudogene) ENSG00000237575 NA NA
10538 BATF basic leucine zipper ATF-like transcription factor ENSG00000156127 The protein encoded by this gene is a nuclear basic leucine zipper protein that belongs to the AP-1/ATF superfamily of transcription factors. The leucine zipper of this protein mediates dimerization with members of the Jun family of proteins. This protein is thought to be a negative regulator of AP-1/ATF transcriptional events. NA
4867 NPHP1 nephrocystin 1 ENSG00000144061 This gene encodes a protein with src homology domain 3 (SH3) patterns. This protein interacts with Crk-associated substrate, and it appears to function in the control of cell division, as well as in cell-cell and cell-matrix adhesion signaling, likely as part of a multifunctional complex localized in actin- and microtubule-based structures. Mutations in this gene cause familial juvenile nephronophthisis type 1, a kidney disorder involving both tubules and glomeruli. Defects in this gene are also associated with Senior-Loken syndrome type 1, also referred to as juvenile nephronophthisis with Leber amaurosis, which is characterized by kidney and eye disease, and with Joubert syndrome type 4, which is characterized by cerebellar ataxia, oculomotor apraxia, psychomotor delay and neonatal breathing abnormalities, sometimes including retinal dystrophy and renal disease. Multiple transcript variants encoding different isoforms have been found for this gene. NA
8048 CSRP3 cysteine and glycine rich protein 3 ENSG00000129170 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. NA
379 ARL4D ADP ribosylation factor like GTPase 4D ENSG00000175906 ADP-ribosylation factor 4D is a member of the ADP-ribosylation factor family of GTP-binding proteins. ARL4D is closely similar to ARL4A and ARL4C and each has a nuclear localization signal and an unusually high guanine nucleotide exchange rate. This protein may play a role in membrane-associated intracellular trafficking. Mutations in this gene have been associated with Bardet-Biedl syndrome (BBS). NA
ENSG00000273018 CTD-2303H24.2 NA ENSG00000273018 NA NA
121227 LRIG3 leucine rich repeats and immunoglobulin like domains 3 ENSG00000139263 NA NA
3949 LDLR low density lipoprotein receptor ENSG00000130164 The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. Low density lipoprotein (LDL) is normally bound at the cell membrane and taken into the cell ending up in lysosomes where the protein is degraded and the cholesterol is made available for repression of microsomal enzyme 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase, the rate-limiting step in cholesterol synthesis. At the same time, a reciprocal stimulation of cholesterol ester synthesis takes place. Mutations in this gene cause the autosomal dominant disorder, familial hypercholesterolemia. Alternate splicing results in multiple transcript variants. NA
57699 CPNE5 copine 5 ENSG00000124772 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. NA
4481 MSR1 macrophage scavenger receptor 1 ENSG00000038945 This gene encodes the class A macrophage scavenger receptors, which include three different types (1, 2, 3) generated by alternative splicing of this gene. These receptors or isoforms are macrophage-specific trimeric integral membrane glycoproteins and have been implicated in many macrophage-associated physiological and pathological processes including atherosclerosis, Alzheimer’s disease, and host defense. The isoforms type 1 and type 2 are functional receptors and are able to mediate the endocytosis of modified low density lipoproteins (LDLs). The isoform type 3 does not internalize modified LDL (acetyl-LDL) despite having the domain shown to mediate this function in the types 1 and 2 isoforms. It has an altered intracellular processing and is trapped within the endoplasmic reticulum, making it unable to perform endocytosis. The isoform type 3 can inhibit the function of isoforms type 1 and type 2 when co-expressed, indicating a dominant negative effect and suggesting a mechanism for regulation of scavenger receptor activity in macrophages. NA
ENSG00000264924 RP11-799B12.2 NA ENSG00000264924 NA NA
721 C4B complement component 4B (Chido blood group) ENSG00000224389 This gene encodes the basic form of complement factor 4, part of the classical activation pathway. The protein is expressed as a single chain precursor which is proteolytically cleaved into a trimer of alpha, beta, and gamma chains prior to secretion. The trimer provides a surface for interaction between the antigen-antibody complex and other complement components. The alpha chain may be cleaved to release C4 anaphylatoxin, a mediator of local inflammation. Deficiency of this protein is associated with systemic lupus erythematosus. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. Varying haplotypes of this gene cluster exist, such that individuals may have 1, 2, or 3 copies of this gene. In addition, this gene exists as a long form and a short form due to the presence or absence of a 6.4 kb endogenous HERV-K retrovirus in intron 9. NA
1113 CHGA chromogranin A ENSG00000100604 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. NA
2289 FKBP5 FK506 binding protein 5 ENSG00000096060 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. NA
643866 CBLN3 cerebellin 3 precursor ENSG00000139899 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). NA
ENSG00000249286 AMD1P3 adenosylmethionine decarboxylase 1 pseudogene 3 ENSG00000249286 NA NA
2012 EMP1 epithelial membrane protein 1 ENSG00000134531 NA NA
10148 EBI3 Epstein-Barr virus induced 3 ENSG00000105246 This gene was identified by its induced expression in B lymphocytes in response Epstein-Barr virus infection. It encodes a secreted glycoprotein belonging to the hematopoietin receptor family, and heterodimerizes with a 28 kDa protein to form interleukin 27 (IL-27). IL-27 regulates T cell and inflammatory responses, in part by activating the Jak/STAT pathway of CD4+ T cells. NA
211 ALAS1 5’-aminolevulinate synthase 1 ENSG00000023330 This gene encodes the mitochondrial enzyme which is catalyzes the rate-limiting step in heme (iron-protoporphyrin) biosynthesis. The enzyme encoded by this gene is the housekeeping enzyme; a separate gene encodes a form of the enzyme that is specific for erythroid tissue. The level of the mature encoded protein is regulated by heme: high levels of heme down-regulate the mature enzyme in mitochondria while low heme levels up-regulate. A pseudogene of this gene is located on chromosome 12. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ENSG00000262905 RP5-1029F21.2 NA ENSG00000262905 NA NA
10653 SPINT2 serine peptidase inhibitor, Kunitz type, 2 ENSG00000167642 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. NA
4256 MGP matrix Gla protein ENSG00000111341 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. NA
9172 MYOM2 myomesin 2 ENSG00000036448 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. NA
3849 KRT2 keratin 2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
23231 SEL1L3 SEL1L family member 3 ENSG00000091490 NA NA
83716 CRISPLD2 cysteine rich secretory protein LCCL domain containing 2 ENSG00000103196 NA NA
29895 MYLPF myosin light chain, phosphorylatable, fast skeletal muscle ENSG00000180209 NA NA
ENSG00000232450 RP4-730K3.3 NA ENSG00000232450 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
FOXM1 2305 ENSG00000111206 forkhead box M1 The protein encoded by this gene is a transcriptional activator involved in cell proliferation. The encoded protein is phosphorylated in M phase and regulates the expression of several cell cycle genes, such as cyclin B1 and cyclin D1. Several transcript variants encoding different isoforms have been found for this gene. NA
RP11-20D14.6 ENSG00000249790 ENSG00000249790 NA NA NA
KIFC1 3833 ENSG00000237649 kinesin family member C1 NA NA
RUNX3 864 ENSG00000020633 runt related transcription factor 3 This gene encodes a member of the runt domain-containing family of transcription factors. A heterodimer of this protein and a beta subunit forms a complex that binds to the core DNA sequence 5’-PYGPYGGT-3’ found in a number of enhancers and promoters, and can either activate or suppress transcription. It also interacts with other transcription factors. It functions as a tumor suppressor, and the gene is frequently deleted or transcriptionally silenced in cancer. Alternative splicing results in multiple transcript variants. NA
PTGDS 5730 ENSG00000107317 prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. NA
RP3-342P20.2 ENSG00000228477 ENSG00000228477 NA NA NA
TNC 3371 ENSG00000041982 tenascin C This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. NA
GAL3ST4 79690 ENSG00000197093 galactose-3-O-sulfotransferase 4 This gene encodes a member of the galactose-3-O-sulfotransferase protein family. The product of this gene catalyzes sulfonation by transferring a sulfate to the C-3’ position of galactose residues in O-linked glycoproteins. This enzyme is highly specific for core 1 structures, with asialofetuin, Gal-beta-1,3-GalNAc and Gal-beta-1,3 (GlcNAc-beta-1,6)GalNAc being good substrates. NA
CPA1 1357 ENSG00000091704 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. NA
SNORD3A 780851 ENSG00000263934 small nucleolar RNA, C/D box 3A U3 RNA, an abundant small nucleolar RNA (snoRNA), is thought to play a role in the processing of ribosomal RNA precursors (Bernstein et al., 1983 [PubMed 6186397]). NA
GAS6-AS1 ENSG00000233695 ENSG00000233695 GAS6 antisense RNA 1 NA NA
MYOZ2 51778 ENSG00000172399 myozenin 2 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. NA
NA NA ENSG00000034063 NA NA TRUE
CTB-36H16.2 ENSG00000260686 ENSG00000260686 NA NA NA
FBLN2 2199 ENSG00000163520 fibulin 2 This gene encodes an extracellular matrix protein, which belongs to the fibulin family. This protein binds various extracellular ligands and calcium. It may play a role during organ development, in particular, during the differentiation of heart, skeletal and neuronal structures. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
KDELR3 11015 ENSG00000100196 KDEL endoplasmic reticulum protein retention receptor 3 This gene encodes a member of the KDEL endoplasmic reticulum protein retention receptor family. Retention of resident soluble proteins in the lumen of the endoplasmic reticulum (ER) is achieved in both yeast and animal cells by their continual retrieval from the cis-Golgi, or a pre-Golgi compartment. Sorting of these proteins is dependent on a C-terminal tetrapeptide signal, usually lys-asp-glu-leu (KDEL) in animal cells, and his-asp-glu-leu (HDEL) in S. cerevisiae. This process is mediated by a receptor that recognizes, and binds the tetrapeptide-containing protein, and returns it to the ER. In yeast, the sorting receptor encoded by a single gene, ERD2, is a seven-transmembrane protein. Unlike yeast, several human homologs of the ERD2 gene, constituting the KDEL receptor gene family, have been described. KDELR3 was the third member of the family to be identified. Alternate splicing results in multiple transcript variants. NA
APOBEC3C 27350 ENSG00000244509 apolipoprotein B mRNA editing enzyme catalytic subunit 3C This gene is a member of the cytidine deaminase gene family. It is one of seven related genes or pseudogenes found in a cluster thought to result from gene duplication, on chromosome 22. Members of the cluster encode proteins that are structurally and functionally related to the C to U RNA-editing cytidine deaminase APOBEC1. It is thought that the proteins may be RNA editing enzymes and have roles in growth or cell cycle control. NA
AC098614.2 ENSG00000213846 ENSG00000213846 NA NA NA
RMI2 116028 ENSG00000175643 RecQ mediated genome instability 2 RMI2 is a component of the BLM (RECQL3; MIM 604610) complex, which plays a role in homologous recombination-dependent DNA repair and is essential for genome stability (Xu et al., 2008 [PubMed 18923082]). NA
PRR11 55771 ENSG00000068489 proline rich 11 NA NA
TYMS 7298 ENSG00000176890 thymidylate synthetase Thymidylate synthase catalyzes the methylation of deoxyuridylate to deoxythymidylate using 5,10-methylenetetrahydrofolate (methylene-THF) as a cofactor. This function maintains the dTMP (thymidine-5-prime monophosphate) pool critical for DNA replication and repair. The enzyme has been of interest as a target for cancer chemotherapeutic agents. It is considered to be the primary site of action for 5-fluorouracil, 5-fluoro-2-prime-deoxyuridine, and some folate analogs. Expression of this gene and that of a naturally occuring antisense transcript rTSalpha (GeneID:55556) vary inversely when cell-growth progresses from late-log to plateau phase. NA
HMMR 3161 ENSG00000072571 hyaluronan mediated motility receptor The protein encoded by this gene is involved in cell motility. It is expressed in breast tissue and together with other proteins, it forms a complex with BRCA1 and BRCA2, thus is potentially associated with higher risk of breast cancer. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. NA
RP11-54F2.1 ENSG00000251196 ENSG00000251196 NA NA NA
TOMM5 401505 ENSG00000175768 translocase of outer mitochondrial membrane 5 NA NA
SEMA7A 8482 ENSG00000138623 semaphorin 7A (John Milton Hagen blood group) This gene encodes a member of the semaphorin family of proteins. The encoded preproprotein is proteolytically processed to generate the mature glycosylphosphatidylinositol (GPI)-anchored membrane glycoprotein. The encoded protein is found on activated lymphocytes and erythrocytes and may be involved in immunomodulatory and neuronal processes. The encoded protein carries the John Milton Hagen (JMH) blood group antigens. Mutations in this gene may be associated with reduced bone mineral density (BMD). Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. NA
COPZ2 51226 ENSG00000005243 coatomer protein complex subunit zeta 2 This gene encodes a member of the adaptor complexes small subunit family. The encoded protein is a subunit of the coatomer protein complex, a seven-subunit complex that functions in the formation of COPI-type, non-clathrin-coated vesicles. COPI vesicles function in the retrograde Golgi-to-ER transport of dilysine-tagged proteins. NA
COL6A2 1292 ENSG00000142173 collagen type VI alpha 2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. NA
PKP2 5318 ENSG00000057294 plakophilin 2 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. NA
MYBL1 4603 ENSG00000185697 MYB proto-oncogene like 1 NA NA
NUSAP1 51203 ENSG00000137804 nucleolar and spindle associated protein 1 NUSAP1 is a nucleolar-spindle-associated protein that plays a role in spindle microtubule organization (Raemaekers et al., 2003 [PubMed 12963707]). NA
PRSS1 5644 ENSG00000204983 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
CDCA3 83461 ENSG00000111665 cell division cycle associated 3 NA NA
CCNB2 9133 ENSG00000157456 cyclin B2 Cyclin B2 is a member of the cyclin family, specifically the B-type cyclins. The B-type cyclins, B1 and B2, associate with p34cdc2 and are essential components of the cell cycle regulatory machinery. B1 and B2 differ in their subcellular localization. Cyclin B1 co-localizes with microtubules, whereas cyclin B2 is primarily associated with the Golgi region. Cyclin B2 also binds to transforming growth factor beta RII and thus cyclin B2/cdc2 may play a key role in transforming growth factor beta-mediated cell cycle control. NA
CYB5R2 51700 ENSG00000166394 cytochrome b5 reductase 2 The protein encoded by this gene belongs to the flavoprotein pyridine nucleotide cytochrome reductase family of proteins. Cytochrome b-type NAD(P)H oxidoreductases are implicated in many processes including cholesterol biosynthesis, fatty acid desaturation and elongation, and respiratory burst in neutrophils and macrophages. Cytochrome b5 reductases have soluble and membrane-bound forms that are the product of alternative splicing. In animal cells, the membrane-bound form binds to the endoplasmic reticulum, where it is a member of a fatty acid desaturation complex. Alternative splicing results in multiple transcript variants. NA
TPSAB1 7177 ENSG00000172236 tryptase alpha/beta 1 Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. NA
CCDC102B 79839 ENSG00000150636 coiled-coil domain containing 102B NA NA
MMP2 4313 ENSG00000087245 matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ANKRD49 54851 ENSG00000168876 ankyrin repeat domain 49 NA NA
PCOLCE-AS1 100129845 ENSG00000224729 PCOLCE antisense RNA 1 NA NA
NA NA ENSG00000272016 NA NA TRUE
MRC2 9902 ENSG00000011028 mannose receptor C type 2 This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. NA
RUNX2 860 ENSG00000124813 runt related transcription factor 2 This gene is a member of the RUNX family of transcription factors and encodes a nuclear protein with an Runt DNA-binding domain. This protein is essential for osteoblastic differentiation and skeletal morphogenesis and acts as a scaffold for nucleic acids and regulatory factors involved in skeletal gene expression. The protein can bind DNA both as a monomer or, with more affinity, as a subunit of a heterodimeric complex. Mutations in this gene have been associated with the bone development disorder cleidocranial dysplasia (CCD). Transcript variants that encode different protein isoforms result from the use of alternate promoters as well as alternate splicing. NA
FBLN5 10516 ENSG00000140092 fibulin 5 The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). NA
PDIA2 64714 ENSG00000185615 protein disulfide isomerase family A member 2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
PPP1R15A 23645 ENSG00000087074 protein phosphatase 1 regulatory subunit 15A This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The induction of this gene by ionizing radiation occurs in certain cell lines regardless of p53 status, and its protein response is correlated with apoptosis following ionizing radiation. NA
PKDCC 91461 ENSG00000162878 protein kinase domain containing, cytoplasmic NA NA
VCAN-AS1 ENSG00000249835 ENSG00000249835 VCAN antisense RNA 1 NA NA
EPB41L4B 54566 ENSG00000095203 erythrocyte membrane protein band 4.1 like 4B NA NA
TPX2 22974 ENSG00000088325 TPX2, microtubule nucleation factor NA NA
LRG1 116844 ENSG00000171236 leucine rich alpha-2-glycoprotein 1 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). NA
GAS6 2621 ENSG00000183087 growth arrest specific 6 This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. NA
ACTN2 88 ENSG00000077522 actinin alpha 2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. NA
RP11-395I6.3 ENSG00000260296 ENSG00000260296 NA NA NA
INF2 64423 ENSG00000203485 inverted formin, FH2 and WH2 domain containing This gene represents a member of the formin family of proteins. It is considered a diaphanous formin due to the presence of a diaphanous inhibitory domain located at the N-terminus of the encoded protein. Studies of a similar mouse protein indicate that the protein encoded by this locus may function in polymerization and depolymerization of actin filaments. Mutations at this locus have been associated with focal segmental glomerulosclerosis 5. NA
DEPDC7 91614 ENSG00000121690 DEP domain containing 7 NA NA
TSPAN13 27075 ENSG00000106537 tetraspanin 13 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. NA
FBN1 2200 ENSG00000166147 fibrillin 1 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. NA
SAMD4A 23034 ENSG00000020577 sterile alpha motif domain containing 4A Sterile alpha motifs (SAMs) in proteins such as SAMD4A are part of an RNA-binding domain that functions as a posttranscriptional regulator by binding to an RNA sequence motif known as the Smaug recognition element, which was named after the Drosophila Smaug protein (Baez and Boccaccio, 2005 [PubMed 16221671]). NA
TCEA3 6920 ENSG00000204219 transcription elongation factor A3 NA NA
PTGS2 5743 ENSG00000073756 prostaglandin-endoperoxide synthase 2 Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is the key enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2, which differ in their regulation of expression and tissue distribution. This gene encodes the inducible isozyme. It is regulated by specific stimulatory events, suggesting that it is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis. NA
RP1-68D18.4 ENSG00000255443 ENSG00000255443 NA NA NA
EGR2 1959 ENSG00000122877 early growth response 2 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. NA
CTRB2 440387 ENSG00000168928 chymotrypsinogen B2 NA NA
F10 2159 ENSG00000126218 coagulation factor X This gene encodes the vitamin K-dependent coagulation factor X of the blood coagulation cascade. This factor undergoes multiple processing steps before its preproprotein is converted to a mature two-chain form by the excision of the tripeptide RKR. Two chains of the factor are held together by 1 or more disulfide bonds; the light chain contains 2 EGF-like domains, while the heavy chain contains the catalytic domain which is structurally homologous to those of the other hemostatic serine proteases. The mature factor is activated by the cleavage of the activation peptide by factor IXa (in the intrisic pathway), or by factor VIIa (in the extrinsic pathway). The activated factor then converts prothrombin to thrombin in the presence of factor Va, Ca+2, and phospholipid during blood clotting. Mutations of this gene result in factor X deficiency, a hemorrhagic condition of variable severity. Alternative splicing results in multiple transcript variants encoding different isoforms that may undergo similar proteolytic processing to generate mature polypeptides. NA
RP11-16E18.3 ENSG00000261542 ENSG00000261542 NA NA NA
ZBED6CL 113763 ENSG00000188707 ZBED6 C-terminal like NA NA
AURKA 6790 ENSG00000087586 aurora kinase A The protein encoded by this gene is a cell cycle-regulated kinase that appears to be involved in microtubule formation and/or stabilization at the spindle pole during chromosome segregation. The encoded protein is found at the centrosome in interphase cells and at the spindle poles in mitosis. This gene may play a role in tumor development and progression. A processed pseudogene of this gene has been found on chromosome 1, and an unprocessed pseudogene has been found on chromosome 10. Multiple transcript variants encoding the same protein have been found for this gene. NA
KCNC3 3748 ENSG00000131398 potassium voltage-gated channel subfamily C member 3 The Shaker gene family of Drosophila encodes components of voltage-gated potassium channels and is comprised of four subfamilies. Based on sequence similarity, this gene is similar to one of these subfamilies, namely the Shaw subfamily. The protein encoded by this gene belongs to the delayed rectifier class of channel proteins and is an integral membrane protein that mediates the voltage-dependent potassium ion permeability of excitable membranes. Alternate splicing results in several transcript variants. NA
APOLD1 81575 ENSG00000178878 apolipoprotein L domain containing 1 APOLD1 is an endothelial cell early response protein that may play a role in regulation of endothelial cell signaling and vascular function (Regard et al., 2004 [PubMed 15102925]). NA
TNFAIP6 7130 ENSG00000123610 TNF alpha induced protein 6 The protein encoded by this gene is a secretory protein that contains a hyaluronan-binding domain, and thus is a member of the hyaluronan-binding protein family. The hyaluronan-binding domain is known to be involved in extracellular matrix stability and cell migration. This protein has been shown to form a stable complex with inter-alpha-inhibitor (I alpha I), and thus enhance the serine protease inhibitory activity of I alpha I, which is important in the protease network associated with inflammation. This gene can be induced by proinflammatory cytokines such as tumor necrosis factor alpha and interleukin-1. Enhanced levels of this protein are found in the synovial fluid of patients with osteoarthritis and rheumatoid arthritis. NA
MYL4 4635 ENSG00000198336 myosin light chain 4 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. NA
CYP17A1 1586 ENSG00000148795 cytochrome P450 family 17 subfamily A member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. NA
PDIK1L 149420 ENSG00000175087 PDLIM1 interacting kinase 1 like NA NA
CACNB1 782 ENSG00000067191 calcium voltage-gated channel auxiliary subunit beta 1 The protein encoded by this gene belongs to the calcium channel beta subunit family. It plays an important role in the calcium channel by modulating G protein inhibition, increasing peak calcium current, controlling the alpha-1 subunit membrane targeting and shifting the voltage dependence of activation and inactivation. Alternative splicing occurs at this locus and three transcript variants encoding three distinct isoforms have been identified. NA
NUDT16P1 152195 ENSG00000246082 nudix hydrolase 16 pseudogene 1 NA NA
NA NA ENSG00000168274 NA NA TRUE
FSCN1 6624 ENSG00000075618 fascin actin-bundling protein 1 This gene encodes a member of the fascin family of actin-binding proteins. Fascin proteins organize F-actin into parallel bundles, and are required for the formation of actin-based cellular protrusions. The encoded protein plays a critical role in cell migration, motility, adhesion and cellular interactions. Expression of this gene is known to be regulated by several microRNAs, and overexpression of this gene may play a role in the metastasis of multiple types of cancer by increasing cell motility. Expression of this gene is also a marker for Reed-Sternberg cells in Hodgkin’s lymphoma. A pseudogene of this gene is located on the long arm of chromosome 15. NA
PRR5L 79899 ENSG00000135362 proline rich 5 like NA NA
CHAD 1101 ENSG00000136457 chondroadherin Chondroadherin is a cartilage matrix protein thought to mediate adhesion of isolated chondrocytes. The protein contains 11 leucine-rich repeats flanked by cysteine-rich regions. The chondroadherin messenger RNA is present in chondrocytes at all ages. NA
MFSD2A 84879 ENSG00000168389 major facilitator superfamily domain containing 2A NA NA
RP11-11N9.4 ENSG00000247134 ENSG00000247134 NA NA NA
INHBA 3624 ENSG00000122641 inhibin beta A subunit The inhibin beta A subunit joins the alpha subunit to form a pituitary FSH secretion inhibitor. Inhibin has been shown to regulate gonadal stromal cell proliferation negatively and to have tumor-suppressor activity. In addition, serum levels of inhibin have been shown to reflect the size of granulosa-cell tumors and can therefore be used as a marker for primary as well as recurrent disease. Because expression in gonadal and various extragonadal tissues may vary severalfold in a tissue-specific fashion, it is proposed that inhibin may be both a growth/differentiation factor and a hormone. Furthermore, the beta A subunit forms a homodimer, activin A, and also joins with a beta B subunit to form a heterodimer, activin AB, both of which stimulate FSH secretion. Finally, it has been shown that the beta A subunit mRNA is identical to the erythroid differentiation factor subunit mRNA and that only one gene for this mRNA exists in the human genome. NA
APOBEC2 10930 ENSG00000124701 apolipoprotein B mRNA editing enzyme catalytic subunit 2 NA NA
MOV10 4343 ENSG00000155363 Mov10 RISC complex RNA helicase NA NA
SLC25A18 83733 ENSG00000182902 solute carrier family 25 member 18 NA NA
NR4A1 3164 ENSG00000123358 nuclear receptor subfamily 4 group A member 1 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
AK1 203 ENSG00000106992 adenylate kinase 1 This gene encodes an adenylate kinase enzyme involved in energy metabolism and homeostasis of cellular adenine nucleotide ratios in different intracellular compartments. This gene is highly expressed in skeletal muscle, brain and erythrocytes. Certain mutations in this gene resulting in a functionally inadequate enzyme are associated with a rare genetic disorder causing nonspherocytic hemolytic anemia. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. NA
RP11-701B16.2 ENSG00000258782 ENSG00000258782 NA NA NA
ID2-AS1 100506299 ENSG00000235092 ID2 antisense RNA 1 (head to head) NA NA
SYTL1 84958 ENSG00000142765 synaptotagmin like 1 NA NA
TGM3 7053 ENSG00000125780 transglutaminase 3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. NA
RP11-426L16.3 ENSG00000225075 ENSG00000225075 NA NA NA
DOCK10 55619 ENSG00000135905 dedicator of cytokinesis 10 This gene encodes a member of the dedicator of cytokinesis protein family. Members of this family are guanosine nucleotide exchange factors for Rho GTPases and defined by the presence of conserved DOCK-homology regions. The encoded protein belongs to the D (or Zizimin) subfamily of DOCK proteins, which also contain an N-terminal pleckstrin homology domain. Alternatively spliced transcript variants that encode different isoforms have been described. NA
SERPINF2 5345 ENSG00000167711 serpin family F member 2 This gene encodes a member of the serpin family of serine protease inhibitors. The protein is a major inhibitor of plasmin, which degrades fibrin and various other proteins. Consequently, the proper function of this gene has a major role in regulating the blood clotting pathway. Mutations in this gene result in alpha-2-plasmin inhibitor deficiency, which is characterized by severe hemorrhagic diathesis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
PTGES 9536 ENSG00000148344 prostaglandin E synthase The protein encoded by this gene is a glutathione-dependent prostaglandin E synthase. The expression of this gene has been shown to be induced by proinflammatory cytokine interleukin 1 beta (IL1B). Its expression can also be induced by tumor suppressor protein TP53, and may be involved in TP53 induced apoptosis. Knockout studies in mice suggest that this gene may contribute to the pathogenesis of collagen-induced arthritis and mediate acute pain during inflammatory responses. NA
TNNI1 7135 ENSG00000159173 troponin I1, slow skeletal type Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. NA
AC003075.4 ENSG00000237773 ENSG00000237773 NA NA NA
GLS2 27165 ENSG00000135423 glutaminase 2 The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms. NA
CTD-2017C7.2 ENSG00000259088 ENSG00000259088 NA NA NA
CNN2P9 ENSG00000213149 ENSG00000213149 calponin 2 pseudogene 9 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
name summary X_id query symbol
loricrin This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. 4014 ENSG00000203782 LOR
keratin 2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 ENSG00000172867 KRT2
cadherin related family member 1 This gene belongs to the cadherin superfamily of calcium-dependent cell adhesion molecules. The encoded protein is a photoreceptor-specific cadherin that plays a role in outer segment disc morphogenesis. Mutations in this gene are associated with inherited retinal dystrophies. Alternatively spliced transcript variants encoding different isoforms have been identified. 92211 ENSG00000148600 CDHR1
troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 ENSG00000118194 TNNT2
dermokine This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 93099 ENSG00000161249 DMKN
S100 calcium binding protein A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). 57402 ENSG00000189334 S100A14
plakophilin 3 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may act in cellular desmosome-dependent adhesion and signaling pathways. Two transcript variants encoding different isoforms have been found for this gene. 11187 ENSG00000184363 PKP3
mucin like 1 NA 118430 ENSG00000172551 MUCL1
serum amyloid A1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. 6288 ENSG00000173432 SAA1
keratin 1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 ENSG00000167768 KRT1
dermcidin This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. 117159 ENSG00000161634 DCD
nebulette This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. 10529 ENSG00000078114 NEBL
v-myc avian myelocytomatosis viral oncogene lung carcinoma derived homolog NA 4610 ENSG00000116990 MYCL
phosphoenolpyruvate carboxykinase 1 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. 5105 ENSG00000124253 PCK1
calmodulin like 5 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. 51806 ENSG00000178372 CALML5
chromosome 3 open reading frame 52 NA 79669 ENSG00000114529 C3orf52
thioesterase superfamily member 5 NA 284486 ENSG00000196407 THEM5
keratinocyte differentiation associated protein This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. 388533 ENSG00000188508 KRTDAP
SAA2-SAA4 readthrough This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. 100528017 ENSG00000255071 SAA2-SAA4
serum amyloid A2 NA 6289 ENSG00000134339 SAA2
hedgehog acyltransferase-like NA 57467 ENSG00000010282 HHATL
ras homolog family member V NA 171177 ENSG00000104140 RHOV
cadherin 1 This gene encodes a classical cadherin of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature glycoprotein. This calcium-dependent cell-cell adhesion protein is comprised of five extracellular cadherin repeats, a transmembrane region and a highly conserved cytoplasmic tail. Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis. The ectodomain of this protein mediates bacterial adhesion to mammalian cells and the cytoplasmic domain is required for internalization. This gene is present in a gene cluster with other members of the cadherin family on chromosome 16. 999 ENSG00000039068 CDH1
desmoplakin This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. 1832 ENSG00000096696 DSP
RAP1 GTPase activating protein This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. 5909 ENSG00000076864 RAP1GAP
troponin I3, cardiac type Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). 7137 ENSG00000129991 TNNI3
prolactin induced protein NA 5304 ENSG00000159763 PIP
EPS8 like 1 This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. 54869 ENSG00000131037 EPS8L1
alcohol dehydrogenase 1B (class I), beta polypeptide The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. 125 ENSG00000196616 ADH1B
fibronectin type III domain containing 4 NA 64838 ENSG00000115226 FNDC4
cathepsin V The protein encoded by this gene, a member of the peptidase C1 family, is a lysosomal cysteine proteinase that may play an important role in corneal physiology. This gene is expressed in colorectal and breast carcinomas but not in normal colon, mammary gland, or peritumoral tissues, suggesting a possible role for this gene in tumor processes. Alternatively spliced variants, encoding the same protein, have been identified. 1515 ENSG00000136943 CTSV
sphingosine-1-phosphate phosphatase 2 The protein encoded by this gene is a transmembrane protein that degrades the bioactive signaling molecule sphingosine 1-phosphate. The encoded protein is induced during inflammatory responses and has been shown to be downregulated by the microRNA-31 tumor suppressor. Alternative splice variants encoding different isoforms have been found for this gene. 130367 ENSG00000163082 SGPP2
ankyrin repeat domain 1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ENSG00000148677 ANKRD1
enolase 1, (alpha) pseudogene 1 NA ENSG00000244457 ENSG00000244457 ENO1P1
keratin 10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10
RAB25, member RAS oncogene family The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. 57111 ENSG00000132698 RAB25
NA NA ENSG00000261286 ENSG00000261286 RP11-517C16.2
troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 ENSG00000114854 TNNC1
keratin 14 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. 3861 ENSG00000186847 KRT14
phosphatidylethanolamine binding protein 4 The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]). 157310 ENSG00000134020 PEBP4
aldehyde oxidase 1 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. 316 ENSG00000138356 AOX1
beta-1,4-N-acetyl-galactosaminyltransferase 3 B4GALNT3 transfers N-acetylgalactosamine (GalNAc) onto glucosyl residues to form N,N-prime-diacetyllactosediamine (LacdiNAc, or LDN), a unique terminal structure of cell surface N-glycans (Ikehara et al., 2006 [PubMed 16728562]). 283358 ENSG00000139044 B4GALNT3
troponin T1, slow skeletal type This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. 7138 ENSG00000105048 TNNT1
suprabasin NA 374897 ENSG00000189001 SBSN
NA NA ENSG00000231864 ENSG00000231864 RP11-229P13.23
protease, serine 8 This gene encodes a member of the peptidase S1 or chymotrypsin family of serine proteases. The encoded preproprotein is proteolytically processed to generate light and heavy chains that associate via a disulfide bond to form the heterodimeric enzyme. This enzyme is highly expressed in prostate epithelia and is one of several proteolytic enzymes found in seminal fluid. This protease exhibits trypsin-like substrate specificity, cleaving protein substrates at the carboxyl terminus of lysine or arginine residues. The encoded protease partially mediates proteolytic activation of the epithelial sodium channel, a regulator of sodium balance, and may also play a role in epithelial barrier formation. 5652 ENSG00000052344 PRSS8
nebulin related anchoring protein NA 4892 ENSG00000197893 NRAP
calcium/calmodulin dependent protein kinase II beta The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. 816 ENSG00000058404 CAMK2B
NA NA ENSG00000258444 ENSG00000258444 CTD-2201G16.1
SRY-box 9 The protein encoded by this gene recognizes the sequence CCTTGAG along with other members of the HMG-box class DNA-binding proteins. It acts during chondrocyte differentiation and, with steroidogenic factor 1, regulates transcription of the anti-Muellerian hormone (AMH) gene. Deficiencies lead to the skeletal malformation syndrome campomelic dysplasia, frequently with sex reversal. 6662 ENSG00000125398 SOX9
family with sequence similarity 83 member H The protein encoded by this gene plays an important role in the structural development and calcification of tooth enamel. Defects in this gene are a cause of amelogenesis imperfecta type 3 (AI3). 286077 ENSG00000180921 FAM83H
sulfotransferase family 2B member 1 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene sulfates dehydroepiandrosterone but not 4-nitrophenol, a typical substrate for the phenol and estrogen sulfotransferase subfamilies. Two alternatively spliced variants that encode different isoforms have been described. 6820 ENSG00000088002 SULT2B1
6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2 The protein encoded by this gene is involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate, and a fructose-2,6-biphosphatase activity that catalyzes the degradation of fructose-2,6-bisphosphate. This protein regulates fructose-2,6-bisphosphate levels in the heart, while a related enzyme encoded by a different gene regulates fructose-2,6-bisphosphate levels in the liver and muscle. This enzyme functions as a homodimer. Two transcript variants encoding two different isoforms have been found for this gene. 5208 ENSG00000123836 PFKFB2
suppressor APC domain containing 2 NA 89958 ENSG00000186193 SAPCD2
LIM domain 7 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. 4008 ENSG00000136153 LMO7
myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7
protein kinase C zeta Protein kinase C (PKC) zeta is a member of the PKC family of serine/threonine kinases which are involved in a variety of cellular processes such as proliferation, differentiation and secretion. Unlike the classical PKC isoenzymes which are calcium-dependent, PKC zeta exhibits a kinase activity which is independent of calcium and diacylglycerol but not of phosphatidylserine. Furthermore, it is insensitive to typical PKC inhibitors and cannot be activated by phorbol ester. Unlike the classical PKC isoenzymes, it has only a single zinc finger module. These structural and biochemical properties indicate that the zeta subspecies is related to, but distinct from other isoenzymes of PKC. Alternative splicing results in multiple transcript variants encoding different isoforms. 5590 ENSG00000067606 PRKCZ
myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 ENSG00000198125 MB
formin binding protein 1 pseudogene 1 NA ENSG00000257800 ENSG00000257800 FNBP1P1
F-box and leucine rich repeat protein 16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330 ENSG00000127585 FBXL16
solute carrier family 7 member 11 This gene encodes a member of a heteromeric, sodium-independent, anionic amino acid transport system that is highly specific for cysteine and glutamate. In this system, designated Xc(-), the anionic form of cysteine is transported in exchange for glutamate. This protein has been identified as the predominant mediator of Kaposi sarcoma-associated herpesvirus fusion and entry permissiveness into cells. Also, increased expression of this gene in primary gliomas (compared to normal brain tissue) was associated with increased glutamate secretion via the XCT channels, resulting in neuronal cell death. 23657 ENSG00000151012 SLC7A11
lymphocyte antigen 6 complex, locus G6C LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). 80740 ENSG00000204421 LY6G6C
aldehyde dehydrogenase 1 family member A3 This gene encodes an aldehyde dehydrogenase enzyme that uses retinal as a substrate. Mutations in this gene have been associated with microphthalmia, isolated 8, and expression changes have also been detected in tumor cells. Alternative splicing results in multiple transcript variants. 220 ENSG00000184254 ALDH1A3
solute carrier family 2 member 1 This gene encodes a major glucose transporter in the mammalian blood-brain barrier. The encoded protein is found primarily in the cell membrane and on the cell surface, where it can also function as a receptor for human T-cell leukemia virus (HTLV) I and II. Mutations in this gene have been found in a family with paroxysmal exertion-induced dyskinesia. 6513 ENSG00000117394 SLC2A1
phospholipase A2 group IIA The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. 5320 ENSG00000188257 PLA2G2A
family with sequence similarity 198 member A NA 729085 ENSG00000144649 FAM198A
metallothionein 1A NA 4489 ENSG00000205362 MT1A
myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 ENSG00000111245 MYL2
chloride intracellular channel 3 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 3 is a member of the p64 family and is predominantly localized in the nucleus and stimulates chloride ion channel activity. In addition, this protein may participate in cellular growth control, based on its association with ERK7, a member of the MAP kinase family. 9022 ENSG00000169583 CLIC3
tumor necrosis factor receptor superfamily member 19 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. 55504 ENSG00000127863 TNFRSF19
cysteine dioxygenase type 1 NA 1036 ENSG00000129596 CDO1
stratifin NA 2810 ENSG00000175793 SFN
heat shock protein family B (small) member 6 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation. 126393 ENSG00000004776 HSPB6
RAB11 family interacting protein 4 Proteins of the large Rab GTPase family (see RAB1A; MIM 179508) have regulatory roles in the formation, targeting, and fusion of intracellular transport vesicles. RAB11FIP4 is one of many proteins that interact with and regulate Rab GTPases (Hales et al., 2001 [PubMed 11495908]). 84440 ENSG00000131242 RAB11FIP4
von Willebrand factor A domain containing 7 NA 80737 ENSG00000204396 VWA7
oxysterol binding protein like 3 This gene encodes a member of the oxysterol-binding protein (OSBP) family, a group of intracellular lipid receptors. Most members contain an N-terminal pleckstrin homology domain and a highly conserved C-terminal OSBP-like sterol-binding domain. The encoded protein is involved in the regulation of cell adhesion and organization of the actin cytoskeleton. Alternative splicing results in multiple transcript variants. 26031 ENSG00000070882 OSBPL3
galectin 7B The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. 653499 ENSG00000178934 LGALS7B
NA NA ENSG00000267328 ENSG00000267328 AC002398.12
protein kinase domain containing, cytoplasmic NA 91461 ENSG00000162878 PKDCC
glutathione S-transferase omega 2 The protein encoded by this gene is an omega class glutathione S-transferase (GST). GSTs are involved in the metabolism of xenobiotics and carcinogens. Four transcript variants encoding different isoforms have been found for this gene. 119391 ENSG00000065621 GSTO2
NA NA ENSG00000240801 ENSG00000240801 AC132217.4
tensin 2 The protein encoded by this gene belongs to the tensin family. Tensin is a focal adhesion molecule that binds to actin filaments and participates in signaling pathways. This protein plays a role in regulating cell migration. Alternative splicing occurs at this locus and three transcript variants encoding three distinct isoforms have been identified. 23371 ENSG00000111077 TNS2
stonin 1 Endocytosis of cell surface proteins is mediated by a complex molecular machinery that assembles on the inner surface of the plasma membrane. This gene encodes one of two human homologs of the Drosophila melanogaster stoned B protein. This protein is related to components of the endocytic machinery and exhibits a modular structure consisting of an N-terminal proline-rich domain, a central region of homology specific to the human stoned B-like proteins, and a C-terminal region homologous to the mu subunits of adaptor protein (AP) complexes. Read-through transcription of this gene into the neighboring downstream gene, which encodes TFIIA-alpha/beta-like factor, generates a transcript (SALF), which encodes a fusion protein comprised of sequence sharing identity with each individual gene product. Alternative splicing results in multiple transcript variants. 11037 ENSG00000243244 STON1
retinoic acid receptor responder 2 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. 5919 ENSG00000106538 RARRES2
inositol polyphosphate-5-phosphatase J NA 27124 ENSG00000185133 INPP5J
NA NA ENSG00000267265 ENSG00000267265 CTC-550B14.7
epithelial cell adhesion molecule This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. 4072 ENSG00000119888 EPCAM
NA NA ENSG00000235609 ENSG00000235609 AF127936.9
NA NA ENSG00000229047 ENSG00000229047 AF127577.10
zymogen granule protein 16B NA 124220 ENSG00000162078 ZG16B
six transmembrane epithelial antigen of the prostate 1 This gene is predominantly expressed in prostate tissue, and is found to be upregulated in multiple cancer cell lines. The gene product is predicted to be a six-transmembrane protein, and was shown to be a cell surface antigen significantly expressed at cell-cell junctions. 26872 ENSG00000164647 STEAP1
parathyroid hormone 1 receptor The protein encoded by this gene is a member of the G-protein coupled receptor family 2. This protein is a receptor for parathyroid hormone (PTH) and for parathyroid hormone-like hormone (PTHLH). The activity of this receptor is mediated by G proteins which activate adenylyl cyclase and also a phosphatidylinositol-calcium second messenger system. Defects in this receptor are known to be the cause of Jansen’s metaphyseal chondrodysplasia (JMC), chondrodysplasia Blomstrand type (BOCD), as well as enchodromatosis. Two transcript variants encoding the same protein have been found for this gene. 5745 ENSG00000160801 PTH1R
myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6
mitogen-activated protein kinase-activated protein kinase 3 This gene encodes a member of the Ser/Thr protein kinase family. This kinase functions as a mitogen-activated protein kinase (MAP kinase)- activated protein kinase. MAP kinases are also known as extracellular signal-regulated kinases (ERKs), act as an integration point for multiple biochemical signals. This kinase was shown to be activated by growth inducers and stress stimulation of cells. In vitro studies demonstrated that ERK, p38 MAP kinase and Jun N-terminal kinase were all able to phosphorylate and activate this kinase, which suggested the role of this kinase as an integrative element of signaling in both mitogen and stress responses. This kinase was reported to interact with, phosphorylate and repress the activity of E47, which is a basic helix-loop-helix transcription factor known to be involved in the regulation of tissue-specific gene expression and cell differentiation. Alternate splicing results in multiple transcript variants that encode the same protein. 7867 ENSG00000114738 MAPKAPK3
bone morphogenetic protein 1 This gene encodes a protein that is capable of inducing formation of cartilage in vivo. Although other bone morphogenetic proteins are members of the TGF-beta superfamily, this gene encodes a protein that is not closely related to other known growth factors. This gene is expressed as alternatively spliced variants that share an N-terminal protease domain but differ in their C-terminal region. 649 ENSG00000168487 BMP1
cysteine and glycine rich protein 3 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. 8048 ENSG00000129170 CSRP3
NA NA ENSG00000224635 ENSG00000224635 RP4-564F22.5
angiopoietin like 8 NA 55908 ENSG00000130173 ANGPTL8
thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 ENSG00000042832 TG
actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ENSG00000143632 ACTA1
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
TNFRSF11B ENSG00000164761 The protein encoded by this gene is a member of the TNF-receptor superfamily. This protein is an osteoblast-secreted decoy receptor that functions as a negative regulator of bone resorption. This protein specifically binds to its ligand, osteoprotegerin ligand, both of which are key extracellular regulators of osteoclast development. Studies of the mouse counterpart also suggest that this protein and its ligand play a role in lymph-node organogenesis and vascular calcification. Alternatively spliced transcript variants of this gene have been reported, but their full length nature has not been determined. tumor necrosis factor receptor superfamily member 11b 4982 NA
NA ENSG00000117289 NA NA NA TRUE
TNFSF10 ENSG00000121858 The protein encoded by this gene is a cytokine that belongs to the tumor necrosis factor (TNF) ligand family. This protein preferentially induces apoptosis in transformed and tumor cells, but does not appear to kill normal cells although it is expressed at a significant level in most normal tissues. This protein binds to several members of TNF receptor superfamily including TNFRSF10A/TRAILR1, TNFRSF10B/TRAILR2, TNFRSF10C/TRAILR3, TNFRSF10D/TRAILR4, and possibly also to TNFRSF11B/OPG. The activity of this protein may be modulated by binding to the decoy receptors TNFRSF10C/TRAILR3, TNFRSF10D/TRAILR4, and TNFRSF11B/OPG that cannot induce apoptosis. The binding of this protein to its receptors has been shown to trigger the activation of MAPK8/JNK, caspase 8, and caspase 3. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tumor necrosis factor superfamily member 10 8743 NA
SLC2A4 ENSG00000181856 This gene is a member of the solute carrier family 2 (facilitated glucose transporter) family and encodes a protein that functions as an insulin-regulated facilitative glucose transporter. In the absence of insulin, this integral membrane protein is sequestered within the cells of muscle and adipose tissue. Within minutes of insulin stimulation, the protein moves to the cell surface and begins to transport glucose across the cell membrane. Mutations in this gene have been associated with noninsulin-dependent diabetes mellitus (NIDDM). solute carrier family 2 member 4 6517 NA
HSPA7 ENSG00000225217 NA heat shock protein family A (Hsp70) member 7 ENSG00000225217 NA
GPR176 ENSG00000166073 Members of the G protein-coupled receptor family, such as GPR176, are cell surface receptors involved in responses to hormones, growth factors, and neurotransmitters (Hata et al., 1995 [PubMed 7893747]). G protein-coupled receptor 176 11245 NA
SNCA ENSG00000145335 Alpha-synuclein is a member of the synuclein family, which also includes beta- and gamma-synuclein. Synucleins are abundantly expressed in the brain and alpha- and beta-synuclein inhibit phospholipase D2 selectively. SNCA may serve to integrate presynaptic signaling and membrane trafficking. Defects in SNCA have been implicated in the pathogenesis of Parkinson disease. SNCA peptides are a major component of amyloid plaques in the brains of patients with Alzheimer’s disease. Four alternatively spliced transcripts encoding two different isoforms have been identified for this gene. synuclein alpha 6622 NA
P2RY1 ENSG00000169860 The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor functions as a receptor for extracellular ATP and ADP. In platelets binding to ADP leads to mobilization of intracellular calcium ions via activation of phospholipase C, a change in platelet shape, and probably to platelet aggregation. purinergic receptor P2Y1 5028 NA
ST6GALNAC2 ENSG00000070731 ST6GALNAC2 belongs to a family of sialyltransferases that add sialic acids to the nonreducing ends of glycoconjugates. At the cell surface, these modifications have roles in cell-cell and cell-substrate interactions, bacterial adhesion, and protein targeting (Samyn-Petit et al., 2000 [PubMed 10742600]). ST6 N-acetylgalactosaminide alpha-2,6-sialyltransferase 2 10610 NA
CST6 ENSG00000175315 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. cystatin E/M 1474 NA
MTHFD1L ENSG00000120254 The protein encoded by this gene is involved in the synthesis of tetrahydrofolate (THF) in the mitochondrion. THF is important in the de novo synthesis of purines and thymidylate and in the regeneration of methionine from homocysteine. Several transcript variants encoding different isoforms have been found for this gene. methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like 25902 NA
SERPINE1 ENSG00000106366 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. serpin family E member 1 5054 NA
RP11-389C8.2 ENSG00000261269 NA NA ENSG00000261269 NA
ZNF215 ENSG00000149054 NA zinc finger protein 215 7762 NA
H19 ENSG00000130600 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. H19, imprinted maternally expressed transcript (non-protein coding) 283120 NA
PLIN5 ENSG00000214456 Members of the perilipin family, such as PLIN5, coat intracellular lipid storage droplets and protect them from lipolytic degradation (Dalen et al., 2007 [PubMed 17234449]). perilipin 5 440503 NA
ADRB2 ENSG00000169252 This gene encodes beta-2-adrenergic receptor which is a member of the G protein-coupled receptor superfamily. This receptor is directly associated with one of its ultimate effectors, the class C L-type calcium channel Ca(V)1.2. This receptor-channel complex also contains a G protein, an adenylyl cyclase, cAMP-dependent kinase, and the counterbalancing phosphatase, PP2A. The assembly of the signaling complex provides a mechanism that ensures specific and rapid signaling by this G protein-coupled receptor. This gene is intronless. Different polymorphic forms, point mutations, and/or downregulation of this gene are associated with nocturnal asthma, obesity and type 2 diabetes. adrenoceptor beta 2 154 NA
RBP7 ENSG00000162444 Due to its chemical instability and low solubility in aqueous solution, vitamin A requires cellular retinol-binding proteins (CRBPs), such as RBP7, for stability, internalization, intercellular transfer, homeostasis, and metabolism. retinol binding protein 7 116362 NA
CARD14 ENSG00000141527 This gene encodes a caspase recruitment domain-containing protein that is a member of the membrane-associated guanylate kinase (MAGUK) family of proteins. Members of this protein family are scaffold proteins that are involved in a diverse array of cellular processes including cellular adhesion, signal transduction and cell polarity control. This protein has been shown to specifically interact with BCL10, a protein known to function as a positive regulator of cell apoptosis and NF-kappaB activation. Alternate splicing results in multiple transcript variants. caspase recruitment domain family member 14 79092 NA
HCP5 ENSG00000206337 NA HLA complex P5 (non-protein coding) 10866 NA
SYNGR1 ENSG00000100321 This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. synaptogyrin 1 9145 NA
NDRG2 ENSG00000165795 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG family member 2 57447 NA
ITGA10 ENSG00000143127 Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. integrin subunit alpha 10 8515 NA
GIMAP5 ENSG00000196329 This gene encodes a protein belonging to the GTP-binding superfamily and to the immuno-associated nucleotide (IAN) subfamily of nucleotide-binding proteins. In humans, the IAN subfamily genes are located in a cluster at 7q36.1. This gene encodes an antiapoptotic protein that functions in T-cell survival. Polymorphisms in this gene are associated with systemic lupus erythematosus. Read-through transcription exists between this gene and the neighboring upstream GIMAP1 (GTPase, IMAP family member 1) gene. GTPase, IMAP family member 5 55340 NA
CTGF ENSG00000118523 The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. connective tissue growth factor 1490 NA
NA ENSG00000241732 NA NA NA TRUE
ARHGAP25 ENSG00000163219 ARHGAPs, such as ARHGAP25, encode negative regulators of Rho GTPases (see ARHA; MIM 165390), which are implicated in actin remodeling, cell polarity, and cell migration (Katoh and Katoh, 2004 [PubMed 15254788]). Rho GTPase activating protein 25 9938 NA
ARHGEF4 ENSG00000136002 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The protein encoded by this gene may form complex with G proteins and stimulate Rho-dependent signals. Multiple alternatively spliced transcript variants encoding different isoforms have been found, but the full-length nature of some variants has not been determined. Rho guanine nucleotide exchange factor 4 50649 NA
RP11-315I20.3 ENSG00000244619 NA NA ENSG00000244619 NA
TRIM63 ENSG00000158022 This gene encodes a member of the RING zinc finger protein family found in striated muscle and iris. The product of this gene is an E3 ubiquitin ligase that localizes to the Z-line and M-line lattices of myofibrils. This protein plays an important role in the atrophy of skeletal and cardiac muscle and is required for the degradation of myosin heavy chain proteins, myosin light chain, myosin binding protein, and for muscle-type creatine kinase. tripartite motif containing 63 84676 NA
ABCG1 ENSG00000160179 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the White subfamily. It is involved in macrophage cholesterol and phospholipids transport, and may regulate cellular lipid homeostasis in other cell types. Six alternative splice variants have been identified. ATP binding cassette subfamily G member 1 9619 NA
SPRR2E ENSG00000203785 This gene encodes a member of a family of small proline-rich proteins clustered in the epidermal differentiation complex on chromosome 1q21. The encoded protein, along with other family members, is a component of the cornified cell envelope that forms beneath the plasma membrane in terminally differentiated stratified squamous epithelia. This envelope serves as a barrier against extracellular and environmental factors. The seven SPRR2 genes (A-G) appear to have been homogenized by gene conversion compared to others in the cluster that exhibit greater differences in protein structure. small proline rich protein 2E 6704 NA
TPH1 ENSG00000129167 This gene encodes a member of the aromatic amino acid hydroxylase family. The encoded protein catalyzes the first and rate limiting step in the biosynthesis of serotonin, an important hormone and neurotransmitter. Mutations in this gene have been associated with an elevated risk for a variety of diseases and disorders, including schizophrenia, somatic anxiety, anger-related traits, bipolar disorder, suicidal behavior, addictions, and others. tryptophan hydroxylase 1 7166 NA
RAI14 ENSG00000039560 NA retinoic acid induced 14 26064 NA
RHCG ENSG00000140519 NA Rh family C glycoprotein 51458 NA
BCAR1 ENSG00000050820 BCAR1, or CAS, is an Src (MIM 190090) family kinase substrate involved in various cellular events, including migration, survival, transformation, and invasion (Sawada et al., 2006 [PubMed 17129785]). BCAR1, Cas family scaffolding protein 9564 NA
NMNAT3 ENSG00000163864 This gene encodes a member of the nicotinamide/nicotinic acid mononucleotide adenylyltransferase family. These enzymes use ATP to catalyze the synthesis of nicotinamide adenine dinucleotide or nicotinic acid adenine dinucleotide from nicotinamide mononucleotide or nicotinic acid mononucleotide, respectively. The encoded protein is localized to mitochondria and may also play a neuroprotective role as a molecular chaperone. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. nicotinamide nucleotide adenylyltransferase 3 349565 NA
CELSR3 ENSG00000008300 This gene belongs to the flamingo subfamily, which is included in the cadherin superfamily. The flamingo cadherins consist of nonclassic-type cadherins that do not interact with catenins. They are plasma membrane proteins containing seven epidermal growth factor-like repeats, nine cadherin domains and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic feature of their subfamily. The encoded protein may be involved in the regulation of contact-dependent neurite growth and may play a role in tumor formation. cadherin EGF LAG seven-pass G-type receptor 3 1951 NA
ABLIM1 ENSG00000099204 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. actin binding LIM protein 1 3983 NA
SOX15 ENSG00000129194 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. SRY-box 15 6665 NA
SIPA1L2 ENSG00000116991 This gene encodes a member of the signal-induced proliferation-associated 1 like family. Members of this family contain a GTPase activating domain, a PDZ domain and a C-terminal coiled-coil domain with a leucine zipper. A similar protein in rat acts as a GTPases for the small GTPase Rap. signal induced proliferation associated 1 like 2 57568 NA
MYH10 ENSG00000133026 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. myosin, heavy chain 10, non-muscle 4628 NA
VASN ENSG00000168140 NA vasorin 114990 NA
FXYD6 ENSG00000137726 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. FXYD domain containing ion transport regulator 6 53826 NA
AC084809.2 ENSG00000226377 NA NA ENSG00000226377 NA
CNFN ENSG00000105427 NA cornifelin 84518 NA
LRRN2 ENSG00000170382 The protein encoded by this gene belongs to the leucine-rich repeat superfamily. This gene was found to be amplified and overexpressed in malignant gliomas. The encoded protein has homology with other proteins that function as cell-adhesion molecules or as signal transduction receptors and is a candidate for the target gene in the 1q32.1 amplicon in malignant gliomas. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. leucine rich repeat neuronal 2 10446 NA
CEACAM1 ENSG00000079385 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. carcinoembryonic antigen related cell adhesion molecule 1 634 NA
TMCC3 ENSG00000057704 NA transmembrane and coiled-coil domain family 3 57458 NA
FCGR3B ENSG00000162747 The protein encoded by this gene is a low affinity receptor for the Fc region of gamma immunoglobulins (IgG). The encoded protein acts as a monomer and can bind either monomeric or aggregated IgG. This gene may function to capture immune complexes in the peripheral circulation. Several transcript variants encoding different isoforms have been found for this gene. A highly-similar gene encoding a related protein is also found on chromosome 1. Fc fragment of IgG receptor IIIb 2215 NA
SH2D3C ENSG00000095370 This gene encodes an adaptor protein and member of a cytoplasmic protein family involved in cell migration. The encoded protein contains a putative Src homology 2 (SH2) domain and guanine nucleotide exchange factor-like domain which allows this signaling protein to form a complex with scaffolding protein Crk-associated substrate. Multiple transcript variants encoding different isoforms have been found for this gene. SH2 domain containing 3C 10044 NA
N4BP3 ENSG00000145911 NA NEDD4 binding protein 3 23138 NA
CD34 ENSG00000174059 The protein encoded by this gene may play a role in the attachment of stem cells to the bone marrow extracellular matrix or to stromal cells. This single-pass membrane protein is highly glycosylated and phosphorylated by protein kinase C. Two transcript variants encoding different isoforms have been found for this gene. CD34 molecule 947 NA
RNF125 ENSG00000101695 This gene encodes a novel E3 ubiquitin ligase that contains a RING finger domain in the N-terminus and three zinc-binding and one ubiquitin-interacting motif in the C-terminus. As a result of myristoylation, this protein associates with membranes and is primarily localized to intracellular membrane systems. The encoded protein may function as a positive regulator in the T-cell receptor signaling pathway. ring finger protein 125 54941 NA
TGM3 ENSG00000125780 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. transglutaminase 3 7053 NA
NEURL1B ENSG00000214357 NA neuralized E3 ubiquitin protein ligase 1B 54492 NA
RP11-688G15.3 ENSG00000258749 NA NA ENSG00000258749 NA
HIGD1B ENSG00000131097 This gene encodes a member of the hypoxia inducible gene 1 (HIG1) domain family. The encoded protein is localized to the cell membrane and has been linked to tumorigenesis and the progression of pituitary adenomas. Alternative splicing results in multiple transcript variants. HIG1 hypoxia inducible domain family member 1B 51751 NA
FGF11 ENSG00000161958 The protein encoded by this gene is a member of the fibroblast growth factor (FGF) family. FGF family members possess broad mitogenic and cell survival activities, and are involved in a variety of biological processes, including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth and invasion. The function of this gene has not yet been determined. The expression pattern of the mouse homolog implies a role in nervous system development. Alternative splicing results in multiple transcript variants. fibroblast growth factor 11 2256 NA
RP11-673E1.3 ENSG00000249741 NA NA ENSG00000249741 NA
DOCK8 ENSG00000107099 This gene encodes a member of the DOCK180 family of guanine nucleotide exchange factors. Guanine nucleotide exchange factors interact with Rho GTPases and are components of intracellular signaling networks. Mutations in this gene result in the autosomal recessive form of the hyper-IgE syndrome. Alternatively spliced transcript variants encoding different isoforms have been described. dedicator of cytokinesis 8 81704 NA
PLBD1 ENSG00000121316 NA phospholipase B domain containing 1 79887 NA
S100A9 ENSG00000163220 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 6280 NA
SMIM5 ENSG00000204323 NA small integral membrane protein 5 643008 NA
CTHRC1 ENSG00000164932 This locus encodes a protein that may play a role in the cellular response to arterial injury through involvement in vascular remodeling. Mutations at this locus have been associated with Barrett esophagus and esophageal adenocarcinoma. Alternatively spliced transcript variants have been described. collagen triple helix repeat containing 1 115908 NA
C1QB ENSG00000173369 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the B-chain polypeptide of human complement subcomponent C1q complement component 1, q subcomponent, B chain 713 NA
PCP4L1 ENSG00000248485 NA Purkinje cell protein 4 like 1 654790 NA
CSTA ENSG00000121552 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. cystatin A 1475 NA
POPDC2 ENSG00000121577 This gene encodes a member of the POP family of proteins which contain three putative transmembrane domains. This membrane associated protein is predominantly expressed in skeletal and cardiac muscle, and may have an important function in these tissues. popeye domain containing 2 64091 NA
CTD-2201I18.1 ENSG00000249825 NA uncharacterized LOC101929215 101929215 NA
GAS6-AS1 ENSG00000233695 NA GAS6 antisense RNA 1 ENSG00000233695 NA
KLHDC8B ENSG00000185909 This gene encodes a protein which forms a distinct beta-propeller protein structure of kelch domains allowing for protein-protein interactions. Mutations in this gene have been associated with Hodgkin lymphoma. kelch domain containing 8B 200942 NA
NA ENSG00000180672 NA NA NA TRUE
KRT5 ENSG00000186081 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 5 3852 NA
RP11-728F11.4 ENSG00000254528 NA NA ENSG00000254528 NA
SLC1A1 ENSG00000106688 This gene encodes a member of the high-affinity glutamate transporters that play an essential role in transporting glutamate across plasma membranes. In brain, these transporters are crucial in terminating the postsynaptic action of the neurotransmitter glutamate, and in maintaining extracellular glutamate concentrations below neurotoxic levels. This transporter also transports aspartate, and mutations in this gene are thought to cause dicarboxylicamino aciduria, also known as glutamate-aspartate transport defect. solute carrier family 1 member 1 6505 NA
NCF2 ENSG00000116701 This gene encodes neutrophil cytosolic factor 2, the 67-kilodalton cytosolic subunit of the multi-protein NADPH oxidase complex found in neutrophils. This oxidase produces a burst of superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in this gene, as well as in other NADPH oxidase subunits, can result in chronic granulomatous disease, a disease that causes recurrent infections by catalase-positive organisms. Alternative splicing results in multiple transcript variants encoding different isoforms. neutrophil cytosolic factor 2 4688 NA
TCP11L2 ENSG00000166046 NA t-complex 11 like 2 255394 NA
MYBL1 ENSG00000185697 NA MYB proto-oncogene like 1 4603 NA
THY1 ENSG00000154096 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. Thy-1 cell surface antigen 7070 NA
CPT1B ENSG00000205560 The protein encoded by this gene, a member of the carnitine/choline acetyltransferase family, is the rate-controlling enzyme of the long-chain fatty acid beta-oxidation pathway in muscle mitochondria. This enzyme is required for the net transport of long-chain fatty acyl-CoAs from the cytoplasm into the mitochondria. Multiple transcript variants encoding different isoforms have been found for this gene, and read-through transcripts are expressed from the upstream locus that include exons from this gene. carnitine palmitoyltransferase 1B 1375 NA
CHL1 ENSG00000134121 The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. cell adhesion molecule L1 like 10752 NA
RP11-334E6.12 ENSG00000263873 NA NA ENSG00000263873 NA
RP11-350G8.9 ENSG00000273110 NA NA ENSG00000273110 NA
MYO15B ENSG00000266714 NA myosin XVB ENSG00000266714 NA
LMO7 ENSG00000136153 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. LIM domain 7 4008 NA
ABI3 ENSG00000108798 This gene encodes a member of an adaptor protein family. Members of this family encode proteins containing a homeobox homology domain, proline rich region and Src-homology 3 (SH3) domain, and are components of the Abi/WAVE complex which regulates actin polymerization. The encoded protein inhibits ectopic metastasis of tumor cells as well as cell migration. This may be accomplished through interaction with p21-activated kinase. Alternative splicing results in multiple transcript variants. ABI family member 3 51225 NA
NCF4 ENSG00000100365 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. neutrophil cytosolic factor 4 4689 NA
MYO1D ENSG00000176658 NA myosin ID 4642 NA
FN1 ENSG00000115414 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. fibronectin 1 2335 NA
STAB1 ENSG00000010327 This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. stabilin 1 23166 NA
OLR1 ENSG00000173391 This gene encodes a low density lipoprotein receptor that belongs to the C-type lectin superfamily. This gene is regulated through the cyclic AMP signaling pathway. The encoded protein binds, internalizes and degrades oxidized low-density lipoprotein. This protein may be involved in the regulation of Fas-induced apoptosis. This protein may play a role as a scavenger receptor. Mutations of this gene have been associated with atherosclerosis, risk of myocardial infarction, and may modify the risk of Alzheimer’s disease. Alternate splicing results in multiple transcript variants. oxidized low density lipoprotein receptor 1 4973 NA
RP11-169D4.2 ENSG00000256633 NA NA ENSG00000256633 NA
TMEM88 ENSG00000167874 NA transmembrane protein 88 92162 NA
IGFBP2 ENSG00000115457 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. insulin like growth factor binding protein 2 3485 NA
EGLN3 ENSG00000129521 NA egl-9 family hypoxia inducible factor 3 112399 NA
SYNM ENSG00000182253 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. synemin 23336 NA
SLC39A14 ENSG00000104635 Zinc is an essential cofactor for hundreds of enzymes. It is involved in protein, nucleic acid, carbohydrate, and lipid metabolism, as well as in the control of gene transcription, growth, development, and differentiation. SLC39A14 belongs to a subfamily of proteins that show structural characteristics of zinc transporters (Taylor and Nicholson, 2003 [PubMed 12659941]). solute carrier family 39 member 14 23516 NA
ATP1B2 ENSG00000129244 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. ATPase Na+/K+ transporting subunit beta 2 482 NA
CTB-79E8.2 ENSG00000253445 NA NA ENSG00000253445 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query notfound
3576 C-X-C motif chemokine ligand 8 The protein encoded by this gene is a member of the CXC chemokine family. This chemokine is one of the major mediators of the inflammatory response. This chemokine is secreted by several cell types. It functions as a chemoattractant, and is also a potent angiogenic factor. This gene is believed to play a role in the pathogenesis of bronchiolitis, a common respiratory tract disease caused by viral infection. This gene and other ten members of the CXC chemokine gene family form a chemokine gene cluster in a region mapped to chromosome 4q. CXCL8 ENSG00000169429 NA
5346 perilipin 1 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. PLIN1 ENSG00000166819 NA
126433 F-box protein 27 Members of the F-box protein family, such as FBXO27, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). FBXO27 ENSG00000161243 NA
81691 exonuclease NEF-sp NA LOC81691 ENSG00000005189 NA
4973 oxidized low density lipoprotein receptor 1 This gene encodes a low density lipoprotein receptor that belongs to the C-type lectin superfamily. This gene is regulated through the cyclic AMP signaling pathway. The encoded protein binds, internalizes and degrades oxidized low-density lipoprotein. This protein may be involved in the regulation of Fas-induced apoptosis. This protein may play a role as a scavenger receptor. Mutations of this gene have been associated with atherosclerosis, risk of myocardial infarction, and may modify the risk of Alzheimer’s disease. Alternate splicing results in multiple transcript variants. OLR1 ENSG00000173391 NA
5054 serpin family E member 1 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. SERPINE1 ENSG00000106366 NA
6712 spectrin beta, non-erythrocytic 2 Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements. SPTBN2 ENSG00000173898 NA
3045 hemoglobin subunit delta The delta (HBD) and beta (HBB) genes are normally expressed in the adult: two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin. Two alpha chains plus two delta chains constitute HbA-2, which with HbF comprises the remaining 3% of adult hemoglobin. Five beta-like globin genes are found within a 45 kb cluster on chromosome 11 in the following order: 5’-epsilon–Ggamma–Agamma–delta–beta-3’. Mutations in the delta-globin gene are associated with beta-thalassemia. HBD ENSG00000223609 NA
78995 chromosome 17 open reading frame 53 NA C17orf53 ENSG00000125319 NA
ENSG00000262001 DLGAP1 antisense RNA 2 NA DLGAP1-AS2 ENSG00000262001 NA
ENSG00000267992 NA NA CTB-189B5.3 ENSG00000267992 NA
65009 NDRG family member 4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. NDRG4 ENSG00000103034 NA
9796 phytanoyl-CoA 2-hydroxylase interacting protein NA PHYHIP ENSG00000168490 NA
729359 perilipin 4 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). PLIN4 ENSG00000167676 NA
80162 ATH1, acid trehalase-like 1 (yeast) NA ATHL1 ENSG00000142102 NA
78990 OTU deubiquitinase, ubiquitin aldehyde binding 2 This gene encodes one of several deubiquitylating enzymes. Ubiquitin modification of proteins is needed for their stability and function; to reverse the process, deubiquityling enzymes remove ubiquitin. This protein contains an OTU domain and binds Ubal (ubiquitin aldehyde); an active cysteine protease site is present in the OTU domain. OTUB2 ENSG00000089723 NA
ENSG00000271857 NA NA RP1-244F24.1 ENSG00000271857 NA
ENSG00000255507 NA NA RP11-535A19.2 ENSG00000255507 NA
5265 serpin family A member 1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. SERPINA1 ENSG00000197249 NA
5208 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2 The protein encoded by this gene is involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate, and a fructose-2,6-biphosphatase activity that catalyzes the degradation of fructose-2,6-bisphosphate. This protein regulates fructose-2,6-bisphosphate levels in the heart, while a related enzyme encoded by a different gene regulates fructose-2,6-bisphosphate levels in the liver and muscle. This enzyme functions as a homodimer. Two transcript variants encoding two different isoforms have been found for this gene. PFKFB2 ENSG00000123836 NA
6770 steroidogenic acute regulatory protein The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. STAR ENSG00000147465 NA
9123 solute carrier family 16 member 3 Lactic acid and pyruvate transport across plasma membranes is catalyzed by members of the proton-linked monocarboxylate transporter (MCT) family, which has been designated solute carrier family-16. Each MCT appears to have slightly different substrate and inhibitor specificities and transport kinetics, which are related to the metabolic requirements of the tissues in which it is found. The MCTs, which include MCT1 (SLC16A1; MIM 600682) and MCT2 (SLC16A7; MIM 603654), are characterized by 12 predicted transmembrane domains (Price et al., 1998 [PubMed 9425115]). SLC16A3 ENSG00000141526 NA
84518 cornifelin NA CNFN ENSG00000105427 NA
4620 myosin, heavy chain 2, skeletal muscle, adult Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. MYH2 ENSG00000125414 NA
84886 chromosome 1 open reading frame 198 NA C1orf198 ENSG00000119280 NA
9837 GINS complex subunit 1 The yeast heterotetrameric GINS complex is made up of Sld5 (GINS4; MIM 610611), Psf1, Psf2 (GINS2; MIM 610609), and Psf3 (GINS3; MIM 610610). The formation of the GINS complex is essential for the initiation of DNA replication in yeast and Xenopus egg extracts (Ueno et al., 2005 [PubMed 16287864]). GINS1 ENSG00000101003 NA
ENSG00000232093 NA NA RP11-307C12.11 ENSG00000232093 NA
ENSG00000219435 testis expressed 40 NA TEX40 ENSG00000219435 NA
143903 layilin NA LAYN ENSG00000204381 NA
3983 actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. ABLIM1 ENSG00000099204 NA
219743 trypsin domain containing 1 This gene encodes a protease that removes the N-terminal peroxisomal targeting signal (PTS2) from proteins produced in the cytosol, thereby facilitating their import into the peroxisome. The encoded protein is also capable of removing the C-terminal peroxisomal targeting signal (PTS1) from proteins in the peroxisomal matrix. The full-length protein undergoes self-cleavage to produce shorter, potentially inactive, peptides. Alternative splicing results in multiple transcript variants for this gene. TYSND1 ENSG00000156521 NA
64785 GINS complex subunit 3 This gene encodes a protein subunit of the GINS heterotetrameric complex, which is essential for the initiation of DNA replication and replisome progression in eukaryotes. Alternatively spliced transcript variants encoding distinct isoforms have been described. GINS3 ENSG00000181938 NA
4008 LIM domain 7 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. LMO7 ENSG00000136153 NA
ENSG00000222112 RNA, 7SK small nuclear pseudogene 16 NA RN7SKP16 ENSG00000222112 NA
3039 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA1 ENSG00000206172 NA
4632 myosin light chain 1 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. MYL1 ENSG00000168530 NA
84793 FOXD2 antisense RNA 1 (head to head) NA FOXD2-AS1 ENSG00000237424 NA
ENSG00000267379 NA NA CTC-548K16.5 ENSG00000267379 NA
6498 SKI-like proto-oncogene The protein encoded by this gene is a component of the SMAD pathway, which regulates cell growth and differentiation through transforming growth factor-beta (TGFB). In the absence of ligand, the encoded protein binds to the promoter region of TGFB-responsive genes and recruits a nuclear repressor complex. TGFB signaling causes SMAD3 to enter the nucleus and degrade this protein, allowing these genes to be activated. Four transcript variants encoding three different isoforms have been found for this gene. SKIL ENSG00000136603 NA
ENSG00000253392 NA NA AC006277.2 ENSG00000253392 NA
3991 lipase E, hormone sensitive type The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. LIPE ENSG00000079435 NA
6004 regulator of G-protein signaling 16 The protein encoded by this gene belongs to the ‘regulator of G protein signaling’ family. It inhibits signal transduction by increasing the GTPase activity of G protein alpha subunits. It also may play a role in regulating the kinetics of signaling in the phototransduction cascade. RGS16 ENSG00000143333 NA
6470 serine hydroxymethyltransferase 1 This gene encodes the cytosolic form of serine hydroxymethyltransferase, a pyridoxal phosphate-containing enzyme that catalyzes the reversible conversion of serine and tetrahydrofolate to glycine and 5,10-methylene tetrahydrofolate. This reaction provides one-carbon units for synthesis of methionine, thymidylate, and purines in the cytoplasm. This gene is located within the Smith-Magenis syndrome region on chromosome 17. A pseudogene of this gene is located on the short arm of chromosome 1. Alternative splicing results in multiple transcript variants. SHMT1 ENSG00000176974 NA
7042 transforming growth factor beta 2 This gene encodes a member of the transforming growth factor beta (TGFB) family of cytokines, which are multifunctional peptides that regulate proliferation, differentiation, adhesion, migration, and other functions in many cell types by transducing their signal through combinations of transmembrane type I and type II receptors (TGFBR1 and TGFBR2) and their downstream effectors, the SMAD proteins. Disruption of the TGFB/SMAD pathway has been implicated in a variety of human cancers. The encoded protein is secreted and has suppressive effects of interleukin-2 dependent T-cell growth. Translocation t(1;7)(q41;p21) between this gene and HDAC9 is associated with Peters’ anomaly, a congenital defect of the anterior chamber of the eye. The knockout mice lacking this gene show perinatal mortality and a wide range of developmental, including cardiac, defects. Alternatively spliced transcript variants encoding different isoforms have been identified. TGFB2 ENSG00000092969 NA
NA NA NA NA ENSG00000256545 TRUE
91442 Fanconi anemia core complex associated protein 24 FAAP24 is a component of the Fanconi anemia (FA) core complex (see MIM 227650), which plays a crucial role in DNA damage response (Ciccia et al., 2007 [PubMed 17289582]). FAAP24 ENSG00000131944 NA
ENSG00000256462 NA NA RP11-116G8.5 ENSG00000256462 NA
65985 acetoacetyl-CoA synthetase NA AACS ENSG00000081760 NA
7043 transforming growth factor beta 3 This gene encodes a member of the TGF-beta family of proteins. The encoded protein is secreted and is involved in embryogenesis and cell differentiation. Defects in this gene are a cause of familial arrhythmogenic right ventricular dysplasia 1. TGFB3 ENSG00000119699 NA
2919 C-X-C motif chemokine ligand 1 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. CXCL1 ENSG00000163739 NA
6347 C-C motif chemokine ligand 2 This gene is one of several cytokine genes clustered on the q-arm of chromosome 17. Chemokines are a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of N-terminal cysteine residues of the mature peptide. This chemokine is a member of the CC subfamily which is characterized by two adjacent cysteine residues. This cytokine displays chemotactic activity for monocytes and basophils but not for neutrophils or eosinophils. It has been implicated in the pathogenesis of diseases characterized by monocytic infiltrates, like psoriasis, rheumatoid arthritis and atherosclerosis. It binds to chemokine receptors CCR2 and CCR4. CCL2 ENSG00000108691 NA
ENSG00000254272 NA NA RP11-382J24.2 ENSG00000254272 NA
79827 CXADR-like membrane protein This gene encodes a type I transmembrane protein that is localized to junctional complexes between endothelial and epithelial cells and may have a role in cell-cell adhesion. Expression of this gene in white adipose tissue is implicated in adipocyte maturation and development of obesity. This gene is also essential for normal intestinal development and mutations in the gene are associated with congenital short bowel syndrome. CLMP ENSG00000166250 NA
113146 AHNAK nucleoprotein 2 NA AHNAK2 ENSG00000185567 NA
7447 visinin like 1 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. VSNL1 ENSG00000163032 NA
1824 desmocollin 2 This gene encodes a member of the desmocollin protein subfamily. Desmocollins, along with desmogleins, are cadherin-like transmembrane glycoproteins that are major components of the desmosome. Desmosomes are cell-cell junctions that help resist shearing forces and are found in high concentrations in cells subject to mechanical stress. This gene is found in a cluster with other desmocollin family members on chromosome 18. Mutations in this gene are associated with arrhythmogenic right ventricular dysplasia-11, and reduced protein expression has been described in several types of cancer. Alternative splicing results in multiple transcript variants. DSC2 ENSG00000134755 NA
5284 polymeric immunoglobulin receptor This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. PIGR ENSG00000162896 NA
5329 plasminogen activator, urokinase receptor This gene encodes the receptor for urokinase plasminogen activator and, given its role in localizing and promoting plasmin formation, likely influences many normal and pathological processes related to cell-surface plasminogen activation and localized degradation of the extracellular matrix. It binds both the proprotein and mature forms of urokinase plasminogen activator and permits the activation of the receptor-bound pro-enzyme by plasmin. The protein lacks transmembrane or cytoplasmic domains and may be anchored to the plasma membrane by a glycosyl-phosphatidylinositol (GPI) moiety following cleavage of the nascent polypeptide near its carboxy-terminus. However, a soluble protein is also produced in some cell types. Alternative splicing results in multiple transcript variants encoding different isoforms. The proprotein experiences several post-translational cleavage reactions that have not yet been fully defined. PLAUR ENSG00000011422 NA
1668 defensin alpha 3 Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 3, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 1 by only one amino acid. This gene and the gene encoding defensin, alpha 1 are both subject to copy number variation. DEFA3 ENSG00000239839 NA
728358 defensin alpha 1B Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 1, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 3 by only one amino acid. This gene and the gene encoding defensin, alpha 3 are both subject to copy number variation. Two transcript variants encoding different isoforms have been found for this gene. DEFA1B ENSG00000239839 NA
1667 defensin alpha 1 Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 1, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 3 by only one amino acid. This gene and the gene encoding defensin, alpha 3 are both subject to copy number variation. DEFA1 ENSG00000239839 NA
80336 poly(A) binding protein cytoplasmic 1 like NA PABPC1L ENSG00000101104 NA
7135 troponin I1, slow skeletal type Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. TNNI1 ENSG00000159173 NA
65989 delta like non-canonical Notch ligand 2 NA DLK2 ENSG00000171462 NA
ENSG00000230530 LIMD1 antisense RNA 1 NA LIMD1-AS1 ENSG00000230530 NA
2064 erb-b2 receptor tyrosine kinase 2 This gene encodes a member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases. This protein has no ligand binding domain of its own and therefore cannot bind growth factors. However, it does bind tightly to other ligand-bound EGF receptor family members to form a heterodimer, stabilizing ligand binding and enhancing kinase-mediated activation of downstream signalling pathways, such as those involving mitogen-activated protein kinase and phosphatidylinositol-3 kinase. Allelic variations at amino acid positions 654 and 655 of isoform a (positions 624 and 625 of isoform b) have been reported, with the most common allele, Ile654/Ile655, shown here. Amplification and/or overexpression of this gene has been reported in numerous cancers, including breast and ovarian tumors. Alternative splicing results in several additional transcript variants, some encoding different isoforms and others that have not been fully characterized. ERBB2 ENSG00000141736 NA
976 adhesion G protein-coupled receptor E5 This gene encodes a member of the EGF-TM7 subfamily of adhesion G protein-coupled receptors, which mediate cell-cell interactions. These proteins are cleaved by self-catalytic proteolysis into a large extracellular subunit and seven-span transmembrane subunit, which associate at the cell surface as a receptor complex. The encoded protein may play a role in cell adhesion as well as leukocyte recruitment, activation and migration, and contains multiple extracellular EGF-like repeats which mediate binding to chondroitin sulfate and the cell surface complement regulatory protein CD55. Expression of this gene may play a role in the progression of several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms with 3 to 5 EGF-like repeats have been observed for this gene. This gene is found in a cluster with other EGF-TM7 genes on the short arm of chromosome 19. ADGRE5 ENSG00000123146 NA
NA NA NA NA ENSG00000256005 TRUE
1286 collagen type IV alpha 4 chain This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR. COL4A4 ENSG00000081052 NA
57467 hedgehog acyltransferase-like NA HHATL ENSG00000010282 NA
5967 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A ENSG00000115386 NA
55659 zinc finger protein 416 NA ZNF416 ENSG00000083817 NA
4900 neurogranin Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. NRGN ENSG00000154146 NA
8639 amine oxidase, copper containing 3 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. AOC3 ENSG00000131471 NA
32 acetyl-CoA carboxylase beta Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. ACACB ENSG00000076555 NA
9235 interleukin 32 This gene encodes a member of the cytokine family. The protein contains a tyrosine sulfation site, 3 potential N-myristoylation sites, multiple putative phosphorylation sites, and an RGD cell-attachment sequence. Expression of this protein is increased after the activation of T-cells by mitogens or the activation of NK cells by IL-2. This protein induces the production of TNFalpha from macrophage cells. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. IL32 ENSG00000008517 NA
2770 G protein subunit alpha i1 Guanine nucleotide binding proteins are heterotrimeric signal-transducing molecules consisting of alpha, beta, and gamma subunits. The alpha subunit binds guanine nucleotide, can hydrolyze GTP, and can interact with other proteins. The protein encoded by this gene represents the alpha subunit of an inhibitory complex. The encoded protein is part of a complex that responds to beta-adrenergic signals by inhibiting adenylate cyclase. Two transcript variants encoding different isoforms have been found for this gene. GNAI1 ENSG00000127955 NA
ENSG00000177337 DLGAP1 antisense RNA 1 NA DLGAP1-AS1 ENSG00000177337 NA
8480 ribonucleic acid export 1 Mutations in the Schizosaccharomyces pombe Rae1 and Saccharomyces cerevisiae Gle2 genes have been shown to result in accumulation of poly(A)-containing mRNA in the nucleus, suggesting that the encoded proteins are involved in RNA export. The protein encoded by this gene is a homolog of yeast Rae1. It contains four WD40 motifs, and has been shown to localize to distinct foci in the nucleoplasm, to the nuclear rim, and to meshwork-like structures throughout the cytoplasm. This gene is thought to be involved in nucleocytoplasmic transport, and in directly or indirectly attaching cytoplasmic mRNPs to the cytoskeleton. Alternatively spliced transcript variants encoding the same protein have been found for this gene. RAE1 ENSG00000101146 NA
ENSG00000267249 NA NA RP11-973H7.3 ENSG00000267249 NA
ENSG00000269463 NA NA RP11-727F15.13 ENSG00000269463 NA
8365 histone cluster 1, H4h Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H4 family. Transcripts from this gene lack polyA tails but instead contain a palindromic termination element. This gene is found in the large histone gene cluster on chromosome 6. HIST1H4H ENSG00000158406 NA
245973 ATPase H+ transporting V1 subunit C2 This gene encodes a component of vacuolar ATPase (V-ATPase), a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, receptor-mediated endocytosis, and synaptic vesicle proton gradient generation. V-ATPase is composed of a cytosolic V1 domain and a transmembrane V0 domain. The V1 domain consists of three A,three B, and two G subunits, as well as a C, D, E, F, and H subunit. The V1 domain contains the ATP catalytic site. This gene encodes alternate transcriptional splice variants, encoding different V1 domain C subunit isoforms. ATP6V1C2 ENSG00000143882 NA
9052 G protein-coupled receptor class C group 5 member A This gene encodes a member of the type 3 G protein-coupling receptor family, characterized by the signature 7-transmembrane domain motif. The encoded protein may be involved in interaction between retinoid acid and G protein signalling pathways. Retinoic acid plays a critical role in development, cellular growth, and differentiation. This gene may play a role in embryonic development and epithelial cell differentiation. GPRC5A ENSG00000013588 NA
112755 syntaxin 1B The protein encoded by this gene belongs to a family of proteins thought to play a role in the exocytosis of synaptic vesicles. Vesicle exocytosis releases vesicular contents and is important to various cellular functions. For instance, the secretion of transmitters from neurons plays an important role in synaptic transmission. After exocytosis, the membrane and proteins from the vesicle are retrieved from the plasma membrane through the process of endocytosis. Mutations in this gene have been identified as one cause of fever-associated epilepsy syndromes. A possible link between this gene and Parkinson’s disease has also been suggested. STX1B ENSG00000099365 NA
ENSG00000262251 NA NA RP11-199F11.2 ENSG00000262251 NA
6440 surfactant protein C This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. SFTPC ENSG00000168484 NA
79000 aurora kinase A and ninein interacting protein NA AUNIP ENSG00000127423 NA
253635 G-patch domain containing 11 NA GPATCH11 ENSG00000152133 NA
27106 arrestin domain containing 2 NA ARRDC2 ENSG00000105643 NA
124976 sphingolipid transporter 2 NA SPNS2 ENSG00000183018 NA
3048 hemoglobin subunit gamma 2 The gamma globin genes (HBG1 and HBG2) are normally expressed in the fetal liver, spleen and bone marrow. Two gamma chains together with two alpha chains constitute fetal hemoglobin (HbF) which is normally replaced by adult hemoglobin (HbA) at birth. In some beta-thalassemias and related conditions, gamma chain production continues into adulthood. The two types of gamma chains differ at residue 136 where glycine is found in the G-gamma product (HBG2) and alanine is found in the A-gamma product (HBG1). The former is predominant at birth. The order of the genes in the beta-globin cluster is: 5’- epsilon – gamma-G – gamma-A – delta – beta–3’. HBG2 ENSG00000196565 NA
134 adenosine A1 receptor The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene. ADORA1 ENSG00000163485 NA
441478 NOTCH-regulated ankyrin repeat protein NA NRARP ENSG00000198435 NA
151246 shugoshin 2 NA SGO2 ENSG00000163535 NA
1584 cytochrome P450 family 11 subfamily B member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. CYP11B1 ENSG00000160882 NA
440689 histone cluster 2, H2bf Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. This structure consists of approximately 146 bp of DNA wrapped around a nucleosome, an octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene encodes a replication-dependent histone that is a member of the histone H2B family and is found in a histone cluster on chromosome 1. HIST2H2BF ENSG00000203814 NA
NA NA NA NA ENSG00000156750 TRUE
78986 dual specificity phosphatase 26 (putative) This gene encodes a member of the tyrosine phosphatase family of proteins and exhibits dual specificity by dephosphorylating tyrosine as well as serine and threonine residues. This gene has been described as both a tumor suppressor and an oncogene depending on the cellular context. This protein may regulate neuronal proliferation and has been implicated in the progression of glioblastoma through its ability to dephosphorylate the p53 tumor suppressor. Alternative splicing results in multiple transcript variants. DUSP26 ENSG00000133878 NA
64284 RAB17, member RAS oncogene family The Rab subfamily of small GTPases plays an important role in the regulation of membrane trafficking. RAB17 is an epithelial cell-specific GTPase (Lutcke et al., 1993 [PubMed 8486736]). RAB17 ENSG00000124839 NA
8013 nuclear receptor subfamily 4 group A member 3 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. NR4A3 ENSG00000119508 NA
3040 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA2 ENSG00000188536 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id symbol query summary notfound
immunoglobulin lambda constant 3 (Kern-Oz+ marker) ENSG00000211679 IGLC3 ENSG00000211679 NA NA
immunoglobulin heavy constant mu ENSG00000211899 IGHM ENSG00000211899 NA NA
immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 IGLC1 ENSG00000211675 NA NA
immunoglobulin lambda like polypeptide 5 100423062 IGLL5 ENSG00000254709 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. NA
immunoglobulin lambda constant 2 (Kern-Oz- marker) ENSG00000211677 IGLC2 ENSG00000211677 NA NA
immunoglobulin heavy constant alpha 1 ENSG00000211895 IGHA1 ENSG00000211895 NA NA
immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 IGHA2 ENSG00000211890 NA NA
sialic acid binding Ig like lectin 10 89790 SIGLEC10 ENSG00000142512 SIGLECs are members of the immunoglobulin superfamily that are expressed on the cell surface. Most SIGLECs have 1 or more cytoplasmic immune receptor tyrosine-based inhibitory motifs, or ITIMs. SIGLECs are typically expressed on cells of the innate immune system, with the exception of the B-cell expressed SIGLEC6 (MIM 604405). NA
NA ENSG00000254760 CTD-2616J11.3 ENSG00000254760 NA NA
NA ENSG00000255441 CTD-2616J11.2 ENSG00000255441 NA NA
joining chain of multimeric IgA and IgM 3512 JCHAIN ENSG00000132465 NA NA
cathepsin S 1520 CTSS ENSG00000163131 The protein encoded by this gene, a member of the peptidase C1 family, is a lysosomal cysteine proteinase that may participate in the degradation of antigenic proteins to peptides for presentation on MHC class II molecules. The encoded protein can function as an elastase over a broad pH range in alveolar macrophages. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NA
serpin family A member 1 5265 SERPINA1 ENSG00000197249 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. NA
apolipoprotein B receptor 55911 APOBR ENSG00000184730 Apolipoprotein B48 receptor is a macrophage receptor that binds to the apolipoprotein B48 of dietary triglyceride (TG)-rich lipoproteins. This receptor may provide essential lipids, lipid-soluble vitamins and other nutrients to reticuloendothelial cells. If overwhelmed with elevated plasma triglyceride, the apolipoprotein B48 receptor may contribute to foam cell formation, endothelial dysfunction, and atherothrombogenesis. NA
cytochrome P450 family 2 subfamily S member 1 29785 CYP2S1 ENSG00000167600 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. NA
NA ENSG00000266903 CTB-171A8.1 ENSG00000266903 NA NA
NA ENSG00000253364 RP11-731F5.2 ENSG00000253364 NA NA
immunoglobulin heavy constant gamma 3 (G3m marker) ENSG00000211897 IGHG3 ENSG00000211897 NA NA
galectin 4 3960 LGALS4 ENSG00000171747 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. NA
thromboxane A synthase 1 6916 TBXAS1 ENSG00000059377 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. However, this protein is considered a member of the cytochrome P450 superfamily on the basis of sequence similarity rather than functional similarity. This endoplasmic reticulum membrane protein catalyzes the conversion of prostglandin H2 to thromboxane A2, a potent vasoconstrictor and inducer of platelet aggregation. The enzyme plays a role in several pathophysiological processes including hemostasis, cardiovascular disease, and stroke. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896 IGHG1 ENSG00000211896 NA NA
placenta specific 8 51316 PLAC8 ENSG00000145287 NA NA
immunoglobulin heavy constant gamma 2 (G2m marker) ENSG00000211893 IGHG2 ENSG00000211893 NA NA
keratin 1 3848 KRT1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
claudin 7 1366 CLDN7 ENSG00000181885 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. Differential expression of this gene has been observed in different types of malignancies, including breast cancer, ovarian cancer, hepatocellular carcinomas, urinary tumors, prostate cancer, lung cancer, head and neck cancers, thyroid carcinomas, etc.. Alternatively spliced transcript variants encoding different isoforms have been found. NA
epithelial cell adhesion molecule 4072 EPCAM ENSG00000119888 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. NA
sulfotransferase family 1A member 2 6799 SULT1A2 ENSG00000197165 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene encodes one of two phenol sulfotransferases with thermostable enzyme activity. Two alternatively spliced variants that encode the same protein have been described. NA
purinergic receptor P2X 1 5023 P2RX1 ENSG00000108405 The protein encoded by this gene belongs to the P2X family of G-protein-coupled receptors. These proteins can form homo-and heterotimers and function as ATP-gated ion channels and mediate rapid and selective permeability to cations. This protein is primarily localized to smooth muscle where binds ATP and mediates synaptic transmission between neurons and from neurons to smooth muscle and may being responsible for sympathetic vasoconstriction in small arteries, arterioles and vas deferens. Mouse studies suggest that this receptor is essential for normal male reproductive function. This protein may also be involved in promoting apoptosis. NA
CD79a molecule 973 CD79A ENSG00000105369 The B lymphocyte antigen receptor is a multimeric complex that includes the antigen-specific component, surface immunoglobulin (Ig). Surface Ig non-covalently associates with two other proteins, Ig-alpha and Ig-beta, which are necessary for expression and function of the B-cell antigen receptor. This gene encodes the Ig-alpha protein of the B-cell antigen component. Alternatively spliced transcript variants encoding different isoforms have been described. NA
serine peptidase inhibitor, Kazal type 1 6690 SPINK1 ENSG00000164266 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. NA
stratifin 2810 SFN ENSG00000175793 NA NA
interleukin 1 beta 3553 IL1B ENSG00000125538 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This cytokine is produced by activated macrophages as a proprotein, which is proteolytically processed to its active form by caspase 1 (CASP1/ICE). This cytokine is an important mediator of the inflammatory response, and is involved in a variety of cellular activities, including cell proliferation, differentiation, and apoptosis. The induction of cyclooxygenase-2 (PTGS2/COX2) by this cytokine in the central nervous system (CNS) is found to contribute to inflammatory pain hypersensitivity. This gene and eight other interleukin 1 family genes form a cytokine gene cluster on chromosome 2. NA
lysozyme 4069 LYZ ENSG00000090382 This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. NA
syntaxin binding protein 2 6813 STXBP2 ENSG00000076944 This gene encodes a member of the STXBP/unc-18/SEC1 family. The encoded protein is involved in intracellular trafficking, control of SNARE (soluble NSF attachment protein receptor) complex assembly, and the release of cytotoxic granules by natural killer cells. Mutations in this gene are associated with familial hemophagocytic lymphohistiocytosis. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. NA
megakaryocyte-associated tyrosine kinase 4145 MATK ENSG00000007264 The protein encoded by this gene has amino acid sequence similarity to Csk tyrosine kinase and has the structural features of the CSK subfamily: SRC homology SH2 and SH3 domains, a catalytic domain, a unique N terminus, lack of myristylation signals, lack of a negative regulatory phosphorylation site, and lack of an autophosphorylation site. This protein is thought to play a significant role in the signal transduction of hematopoietic cells. It is able to phosphorylate and inactivate Src family kinases, and may play an inhibitory role in the control of T-cell proliferation. This protein might be involved in signaling in some cases of breast cancer. Three alternatively spliced transcript variants that encode different isoforms have been described for this gene. NA
leukocyte receptor tyrosine kinase 4058 LTK ENSG00000062524 The protein encoded by this gene is a member of the ros/insulin receptor family of tyrosine kinases. Tyrosine-specific phosphorylation of proteins is a key to the control of diverse pathways leading to cell growth and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. NA
tetraspanin 1 10103 TSPAN1 ENSG00000117472 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. NA
2’-5’-oligoadenylate synthetase like 8638 OASL ENSG00000135114 NA NA
immunoglobulin heavy constant gamma 4 (G4m marker) ENSG00000211892 IGHG4 ENSG00000211892 NA NA
integrin subunit alpha X 3687 ITGAX ENSG00000140678 This gene encodes the integrin alpha X chain protein. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as inactivated-C3b (iC3b) receptor 4 (CR4). The alpha X beta 2 complex seems to overlap the properties of the alpha M beta 2 integrin in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Two transcript variants encoding different isoforms have been found for this gene. NA
Rho GTPase activating protein 45 23526 ARHGAP45 ENSG00000180448 NA NA
cytidine/uridine monophosphate kinase 2 129607 CMPK2 ENSG00000134326 This gene encodes one of the enzymes in the nucleotide synthesis salvage pathway that may participate in terminal differentiation of monocytic cells. Multiple transcript variants encoding different isoforms have been found for this gene. NA
suppression of tumorigenicity 14 6768 ST14 ENSG00000149418 The protein encoded by this gene is an epithelial-derived, integral membrane serine protease. This protease forms a complex with the Kunitz-type serine protease inhibitor, HAI-1, and is found to be activated by sphingosine 1-phosphate. This protease has been shown to cleave and activate hepatocyte growth factor/scattering factor, and urokinase plasminogen activator, which suggest the function of this protease as an epithelial membrane activator for other proteases and latent growth factors. The expression of this protease has been associated with breast, colon, prostate, and ovarian tumors, which implicates its role in cancer invasion, and metastasis. NA
tumor protein D52 7163 TPD52 ENSG00000076554 NA NA
NA NA NA ENSG00000161570 NA TRUE
ribosomal protein S6 kinase A1 6195 RPS6KA1 ENSG00000117676 This gene encodes a member of the RSK (ribosomal S6 kinase) family of serine/threonine kinases. This kinase contains 2 nonidentical kinase catalytic domains and phosphorylates various substrates, including members of the mitogen-activated kinase (MAPK) signalling pathway. The activity of this protein has been implicated in controlling cell growth and differentiation. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
caspase recruitment domain family member 11 84433 CARD11 ENSG00000198286 The protein encoded by this gene belongs to the membrane-associated guanylate kinase (MAGUK) family, a class of proteins that functions as molecular scaffolds for the assembly of multiprotein complexes at specialized regions of the plasma membrane. This protein is also a member of the CARD protein family, which is defined by carrying a characteristic caspase-associated recruitment domain (CARD). This protein has a domain structure similar to that of CARD14 protein. The CARD domains of both proteins have been shown to specifically interact with BCL10, a protein known to function as a positive regulator of cell apoptosis and NF-kappaB activation. When expressed in cells, this protein activated NF-kappaB and induced the phosphorylation of BCL10. NA
protein disulfide isomerase family A member 2 64714 PDIA2 ENSG00000185615 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
serine peptidase inhibitor, Kunitz type 1 6692 SPINT1 ENSG00000166145 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. NA
carcinoembryonic antigen related cell adhesion molecule 1 634 CEACAM1 ENSG00000079385 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. NA
RNA binding motif protein 47 54502 RBM47 ENSG00000163694 NA NA
UDP-galactose-4-epimerase 2582 GALE ENSG00000117308 This gene encodes UDP-galactose-4-epimerase which catalyzes two distinct but analogous reactions: the epimerization of UDP-glucose to UDP-galactose, and the epimerization of UDP-N-acetylglucosamine to UDP-N-acetylgalactosamine. The bifunctional nature of the enzyme has the important metabolic consequence that mutant cells (or individuals) are dependent not only on exogenous galactose, but also on exogenous N-acetylgalactosamine as a necessary precursor for the synthesis of glycoproteins and glycolipids. Mutations in this gene result in epimerase-deficiency galactosemia, also referred to as galactosemia type 3, a disease characterized by liver damage, early-onset cataracts, deafness and mental retardation, with symptoms ranging from mild (‘peripheral’ form) to severe (‘generalized’ form). Multiple alternatively spliced transcripts encoding the same protein have been identified. NA
NA ENSG00000255118 RP11-703H8.7 ENSG00000255118 NA NA
keratin 10 3858 KRT10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. NA
tetraspanin 13 27075 TSPAN13 ENSG00000106537 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. NA
thymocyte selection associated high mobility group box 9760 TOX ENSG00000198846 The protein encoded by this gene contains a HMG box DNA binding domain. HMG boxes are found in many eukaryotic proteins involved in chromatin assembly, transcription and replication. This protein may function to regulate T-cell development. NA
interleukin 18 receptor accessory protein 8807 IL18RAP ENSG00000115607 The protein encoded by this gene is an accessory subunit of the heterodimeric receptor for interleukin 18 (IL18), a proinflammatory cytokine involved in inducing cell-mediated immunity. This protein enhances the IL18-binding activity of the IL18 receptor and plays a role in signaling by IL18. Mutations in this gene are associated with Crohn’s disease and inflammatory bowel disease, and susceptibility to celiac disease and leprosy. Alternatively spliced transcript variants of this gene have been described, but their full-length nature is not known. NA
calponin 1 1264 CNN1 ENSG00000130176 NA NA
NLR family pyrin domain containing 3 114548 NLRP3 ENSG00000162711 This gene encodes a pyrin-like protein containing a pyrin domain, a nucleotide-binding site (NBS) domain, and a leucine-rich repeat (LRR) motif. This protein interacts with the apoptosis-associated speck-like protein PYCARD/ASC, which contains a caspase recruitment domain, and is a member of the NALP3 inflammasome complex. This complex functions as an upstream activator of NF-kappaB signaling, and it plays a role in the regulation of inflammation, the immune response, and apoptosis. Mutations in this gene are associated with familial cold autoinflammatory syndrome (FCAS), Muckle-Wells syndrome (MWS), chronic infantile neurological cutaneous and articular (CINCA) syndrome, and neonatal-onset multisystem inflammatory disease (NOMID). Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene. Alternative 5’ UTR structures are suggested by available data; however, insufficient evidence is available to determine if all of the represented 5’ UTR splice patterns are biologically valid. NA
Src-like-adaptor 6503 SLA ENSG00000155926 NA NA
lymphotoxin beta 4050 LTB ENSG00000227507 Lymphotoxin beta is a type II membrane protein of the TNF family. It anchors lymphotoxin-alpha to the cell surface through heterotrimer formation. The predominant form on the lymphocyte surface is the lymphotoxin-alpha 1/beta 2 complex (e.g. 1 molecule alpha/2 molecules beta) and this complex is the primary ligand for the lymphotoxin-beta receptor. The minor complex is lymphotoxin-alpha 2/beta 1. LTB is an inducer of the inflammatory response system and involved in normal development of lymphoid tissue. Lymphotoxin-beta isoform b is unable to complex with lymphotoxin-alpha suggesting a function for lymphotoxin-beta which is independent of lympyhotoxin-alpha. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
albumin 213 ALB ENSG00000163631 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. NA
triggering receptor expressed on myeloid cells 2 54209 TREM2 ENSG00000095970 This gene encodes a membrane protein that forms a receptor signaling complex with the TYRO protein tyrosine kinase binding protein. The encoded protein functions in immune response and may be involved in chronic inflammation by triggering the production of constitutive inflammatory cytokines. Defects in this gene are a cause of polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy (PLOSL). Alternative splicing results in multiple transcript variants encoding different isoforms. NA
SRY-box 9 6662 SOX9 ENSG00000125398 The protein encoded by this gene recognizes the sequence CCTTGAG along with other members of the HMG-box class DNA-binding proteins. It acts during chondrocyte differentiation and, with steroidogenic factor 1, regulates transcription of the anti-Muellerian hormone (AMH) gene. Deficiencies lead to the skeletal malformation syndrome campomelic dysplasia, frequently with sex reversal. NA
glutathione peroxidase 2 2877 GPX2 ENSG00000176153 This gene is a member of the glutathione peroxidase family and encodes a selenium-dependent glutathione peroxidase that is one of two isoenzymes responsible for the majority of the glutathione-dependent hydrogen peroxide-reducing activity in the epithelium of the gastrointestinal tract. The protein encoded by this locus contains a selenocysteine (Sec) residue encoded by the UGA codon, which normally signals translation termination. Alternatively spliced transcript variants have been described. NA
surfactant protein A1 653509 SFTPA1 ENSG00000122852 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. NA
lipolysis stimulated lipoprotein receptor 51599 LSR ENSG00000105699 NA NA
PCED1B antisense RNA 1 100233209 PCED1B-AS1 ENSG00000247774 NA NA
solute carrier family 7 member 7 9056 SLC7A7 ENSG00000155465 The protein encoded by this gene is the light subunit of a cationic amino acid transporter. This sodium-independent transporter is formed when the light subunit encoded by this gene dimerizes with the heavy subunit transporter protein SLC3A2. This transporter is found in epithelial cell membranes where it transfers cationic and large neutral amino acids from the cell to the extracellular space. Defects in this gene are a cause of lysinuric protein intolerance (LPI). Alternative splicing results in multiple transcript variants. NA
sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 9806 SPOCK2 ENSG00000107742 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. NA
CD52 molecule 1043 CD52 ENSG00000169442 NA NA
apolipoprotein H 350 APOH ENSG00000091583 Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. NA
NA ENSG00000232934 RP11-324O2.3 ENSG00000232934 NA NA
ABRA C-terminal like 58527 ABRACL ENSG00000146386 NA NA
RAP1 GTPase activating protein 5909 RAP1GAP ENSG00000076864 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. NA
tumor necrosis factor receptor superfamily member 11a 8792 TNFRSF11A ENSG00000141655 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptors can interact with various TRAF family proteins, through which this receptor induces the activation of NF-kappa B and MAPK8/JNK. This receptor and its ligand are important regulators of the interaction between T cells and dendritic cells. This receptor is also an essential mediator for osteoclast and lymph node development. Mutations at this locus have been associated with familial expansile osteolysis, autosomal recessive osteopetrosis, and Paget disease of bone. Alternatively spliced transcript variants have been described for this locus. NA
NA ENSG00000257764 RP11-1143G9.4 ENSG00000257764 NA NA
Purkinje cell protein 4 5121 PCP4 ENSG00000183036 NA NA
fucosyltransferase 2 2524 FUT2 ENSG00000176920 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. NA
coronin 2A 7464 CORO2A ENSG00000106789 This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. This protein contains 5 WD repeats, and has a structural similarity with actin-binding proteins: the D. discoideum coronin and the human p57 protein, suggesting that this protein may also be an actin-binding protein that regulates cell motility. Alternative splicing of this gene generates 2 transcript variants. NA
NA ENSG00000267815 CTB-191K22.5 ENSG00000267815 NA NA
aldo-keto reductase family 7 member A3 22977 AKR7A3 ENSG00000162482 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. NA
kelch repeat and BTB domain containing 8 84541 KBTBD8 ENSG00000163376 NA NA
CD4 molecule 920 CD4 ENSG00000010610 This gene encodes a membrane glycoprotein of T lymphocytes that interacts with major histocompatibility complex class II antigenes and is also a receptor for the human immunodeficiency virus. This gene is expressed not only in T lymphocytes, but also in B cells, macrophages, and granulocytes. It is also expressed in specific regions of the brain. The protein functions to initiate or augment the early phase of T-cell activation, and may function as an important mediator of indirect neuronal damage in infectious and immune-mediated diseases of the central nervous system. Multiple alternatively spliced transcript variants encoding different isoforms have been identified in this gene. NA
cytochrome b-245 alpha chain 1535 CYBA ENSG00000051523 Cytochrome b is comprised of a light chain (alpha) and a heavy chain (beta). This gene encodes the light, alpha subunit which has been proposed as a primary component of the microbicidal oxidase system of phagocytes. Mutations in this gene are associated with autosomal recessive chronic granulomatous disease (CGD), that is characterized by the failure of activated phagocytes to generate superoxide, which is important for the microbicidal activity of these cells. NA
BCL2 like 15 440603 BCL2L15 ENSG00000188761 NA NA
lysosomal protein transmembrane 5 7805 LAPTM5 ENSG00000162511 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. NA
potassium voltage-gated channel subfamily A regulatory beta subunit 2 8514 KCNAB2 ENSG00000069424 Voltage-gated potassium (Kv) channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. This member is one of the beta subunits, which are auxiliary proteins associating with functional Kv-alpha subunits. This member alters functional properties of the KCNA4 gene product. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. NA
activin A receptor like type 1 94 ACVRL1 ENSG00000139567 This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. NA
keratin 7 3855 KRT7 ENSG00000135480 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. NA
selectin L 6402 SELL ENSG00000188404 This gene encodes a cell surface adhesion molecule that belongs to a family of adhesion/homing receptors. The encoded protein contains a C-type lectin-like domain, a calcium-binding epidermal growth factor-like domain, and two short complement-like repeats. The gene product is required for binding and subsequent rolling of leucocytes on endothelial cells, facilitating their migration into secondary lymphoid organs and inflammation sites. Single-nucleotide polymorphisms in this gene have been associated with various diseases including immunoglobulin A nephropathy. Alternatively spliced transcript variants have been found for this gene. NA
NA ENSG00000255468 RP11-867G23.8 ENSG00000255468 NA NA
docking protein 3 79930 DOK3 ENSG00000146094 NA NA
sorting nexin 10 29887 SNX10 ENSG00000086300 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This protein does not contain a coiled coil region, like some family members. This gene may play a role in regulating endosome homeostasis. Alternative splicing results in multiple transcript variants. NA
FYN binding protein 2533 FYB ENSG00000082074 The protein encoded by this gene is an adapter for the FYN protein and LCP2 signaling cascades in T-cells. The encoded protein is involved in platelet activation and controls the expression of interleukin-2. Three transcript variants encoding different isoforms have been found for this gene. NA
NA ENSG00000233483 CTD-2020K17.4 ENSG00000233483 NA NA
calmodulin like 5 51806 CALML5 ENSG00000178372 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. NA
T-box 15 6913 TBX15 ENSG00000092607 This gene belongs to the T-box family of genes, which encode a phylogenetically conserved family of transcription factors that regulate a variety of developmental processes. All these genes contain a common T-box DNA-binding domain. Mutations in this gene are associated with Cousin syndrome. NA
hepatitis A virus cellular receptor 2 84868 HAVCR2 ENSG00000135077 The protein encoded by this gene belongs to the immunoglobulin superfamily, and TIM family of proteins. CD4-positive T helper lymphocytes can be divided into types 1 (Th1) and 2 (Th2) on the basis of their cytokine secretion patterns. Th1 cells are involved in cell-mediated immunity to intracellular pathogens and delayed-type hypersensitivity reactions, whereas, Th2 cells are involved in the control of extracellular helminthic infections and the promotion of atopic and allergic diseases. This protein is a Th1-specific cell surface protein that regulates macrophage activation, and inhibits Th1-mediated auto- and alloimmune responses, and promotes immunological tolerance. NA
surfactant protein A2 729238 SFTPA2 ENSG00000185303 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query X_id name summary notfound
RHOV ENSG00000104140 171177 ras homolog family member V NA NA
SAA1 ENSG00000173432 6288 serum amyloid A1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. NA
CD200 ENSG00000091972 4345 CD200 molecule This gene encodes a type I membrane glycoprotein containing two extracellular immunoglobulin domains, a transmembrane and a cytoplasmic domain. This gene is expressed by various cell types, including B cells, a subset of T cells, thymocytes, endothelial cells, and neurons. The encoded protein plays an important role in immunosuppression and regulation of anti-tumor activity. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ANKRD22 ENSG00000152766 118932 ankyrin repeat domain 22 NA NA
LRRC1 ENSG00000137269 55227 leucine rich repeat containing 1 NA NA
RP3-523E19.2 ENSG00000271218 ENSG00000271218 NA NA NA
STX19 ENSG00000178750 415117 syntaxin 19 NA NA
DEGS2 ENSG00000168350 123099 delta(4)-desaturase, sphingolipid 2 This gene encodes a bifunctional enzyme that is involved in the biosynthesis of phytosphingolipids in human skin and in other phytosphingolipid-containing tissues. This enzyme can act as a sphingolipid delta(4)-desaturase, and also as a sphingolipid C4-hydroxylase. NA
CTA-293F17.1 ENSG00000271133 ENSG00000271133 NA NA NA
SYTL1 ENSG00000142765 84958 synaptotagmin like 1 NA NA
HPD ENSG00000158104 3242 4-hydroxyphenylpyruvate dioxygenase The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. NA
SPARCL1 ENSG00000152583 8404 SPARC like 1 NA NA
C15orf48 ENSG00000166920 84419 chromosome 15 open reading frame 48 This gene was first identified in a study of human esophageal squamous cell carcinoma tissues. Levels of both the message and protein are reduced in carcinoma samples. In adult human tissues, this gene is expressed in the the esophagus, stomach, small intestine, colon and placenta. Alternatively spliced transcript variants that encode the same protein have been identified. NA
MST1R ENSG00000164078 4486 macrophage stimulating 1 receptor This gene encodes a cell surface receptor for macrophage-stimulating protein (MSP) with tyrosine kinase activity. The mature form of this protein is a heterodimer of disulfide-linked alpha and beta subunits, generated by proteolytic cleavage of a single-chain precursor. The beta subunit undergoes tyrosine phosphorylation upon stimulation by MSP. This protein is expressed on the ciliated epithelia of the mucociliary transport apparatus of the lung, and together with MSP, thought to be involved in host defense. Alternative splicing generates multiple transcript variants encoding different isoforms that may undergo similar proteolytic processing. NA
CNN1 ENSG00000130176 1264 calponin 1 NA NA
JPH1 ENSG00000104369 56704 junctophilin 1 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. NA
NA ENSG00000205246 NA NA NA TRUE
NRARP ENSG00000198435 441478 NOTCH-regulated ankyrin repeat protein NA NA
NAGS ENSG00000161653 162417 N-acetylglutamate synthase The N-acetylglutamate synthase gene encodes a mitochondrial enzyme that catalyzes the formation of N-acetylglutamate (NAG) from glutamate and acetyl coenzyme-A. NAG is a cofactor of carbamyl phosphate synthetase I (CPSI), the first enzyme of the urea cycle in mammals. This gene may regulate ureagenesis by altering NAG availability and, thereby, CPSI activity. Deficiencies in N-acetylglutamate synthase have been associated with hyperammonemia. NA
APOC1 ENSG00000130208 341 apolipoprotein C1 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. NA
FTCD ENSG00000160282 10841 formimidoyltransferase cyclodeaminase The protein encoded by this gene is a bifunctional enzyme that channels 1-carbon units from formiminoglutamate, a metabolite of the histidine degradation pathway, to the folate pool. Mutations in this gene are associated with glutamate formiminotransferase deficiency. Alternatively spliced transcript variants have been found for this gene. NA
FAR1 ENSG00000197601 84188 fatty acyl-CoA reductase 1 The protein encoded by this gene is required for the reduction of fatty acids to fatty alcohols, a process that is required for the synthesis of monoesters and ether lipids. NADPH is required as a cofactor in this reaction, and 16-18 carbon saturated and unsaturated fatty acids are the preferred substrate. This is a peroxisomal membrane protein, and studies suggest that the N-terminus contains a large catalytic domain located on the outside of the peroxisome, while the C-terminus is exposed to the matrix of the peroxisome. Studies indicate that the regulation of this protein is dependent on plasmalogen levels. Mutations in this gene have been associated with individuals affected by severe intellectual disability, early-onset epilepsy, microcephaly, congenital cataracts, growth retardation, and spasticity (PMID: 25439727). A pseudogene of this gene is located on chromosome 13. NA
C1R ENSG00000159403 715 complement C1r subcomponent NA NA
CASQ2 ENSG00000118729 845 calsequestrin 2 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. NA
MPZL2 ENSG00000149573 10205 myelin protein zero like 2 Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. NA
TTPAL ENSG00000124120 79183 alpha tocopherol transfer protein like NA NA
KIT ENSG00000157404 3815 KIT proto-oncogene receptor tyrosine kinase This gene encodes the human homolog of the proto-oncogene c-kit. C-kit was first identified as the cellular homolog of the feline sarcoma viral oncogene v-kit. This protein is a type 3 transmembrane receptor for MGF (mast cell growth factor, also known as stem cell factor). Mutations in this gene are associated with gastrointestinal stromal tumors, mast cell disease, acute myelogenous lukemia, and piebaldism. Multiple transcript variants encoding different isoforms have been found for this gene. NA
NTN1 ENSG00000065320 9423 netrin 1 Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. NA
SAA2 ENSG00000134339 6289 serum amyloid A2 NA NA
ORM1 ENSG00000229314 5004 orosomucoid 1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. NA
SPINT1 ENSG00000166145 6692 serine peptidase inhibitor, Kunitz type 1 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. NA
CEBPB ENSG00000172216 1051 CCAAT/enhancer binding protein beta This intronless gene encodes a transcription factor that contains a basic leucine zipper (bZIP) domain. The encoded protein functions as a homodimer but can also form heterodimers with CCAAT/enhancer-binding proteins alpha, delta, and gamma. Activity of this protein is important in the regulation of genes involved in immune and inflammatory responses, among other processes. The use of alternative in-frame AUG start codons results in multiple protein isoforms, each with distinct biological functions. NA
AF131215.9 ENSG00000269918 ENSG00000269918 NA NA NA
AOX1 ENSG00000138356 316 aldehyde oxidase 1 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. NA
PDGFA ENSG00000197461 5154 platelet derived growth factor subunit A This gene encodes a member of the protein family comprised of both platelet-derived growth factors (PDGF) and vascular endothelial growth factors (VEGF). The encoded preproprotein is proteolytically processed to generate platelet-derived growth factor subunit A, which can homodimerize, or alternatively, heterodimerize with the related platelet-derived growth factor subunit B. These proteins bind and activate PDGF receptor tyrosine kinases, which play a role in a wide range of developmental processes. Alternative splicing results in multiple transcript variants. NA
IVD ENSG00000128928 3712 isovaleryl-CoA dehydrogenase Isovaleryl-CoA dehydrogenase (IVD) is a mitochondrial matrix enzyme that catalyzes the third step in leucine catabolism. The genetic deficiency of IVD results in an accumulation of isovaleric acid, which is toxic to the central nervous system and leads to isovaleric acidemia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
APLP1 ENSG00000105290 333 amyloid beta precursor like protein 1 This gene encodes a member of the highly conserved amyloid precursor protein gene family. The encoded protein is a membrane-associated glycoprotein that is cleaved by secretases in a manner similar to amyloid beta A4 precursor protein cleavage. This cleavage liberates an intracellular cytoplasmic fragment that may act as a transcriptional activator. The encoded protein may also play a role in synaptic maturation during cortical development. Alternatively spliced transcript variants encoding different isoforms have been described. NA
NA ENSG00000241732 NA NA NA TRUE
PDIA2 ENSG00000185615 64714 protein disulfide isomerase family A member 2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
PTGES3L ENSG00000267060 100885848 prostaglandin E synthase 3 (cytosolic)-like NA NA
TC2N ENSG00000165929 123036 tandem C2 domains, nuclear NA NA
LOC105378272 ENSG00000230555 105378272 uncharacterized LOC105378272 NA NA
POR ENSG00000127948 5447 cytochrome p450 oxidoreductase This gene encodes an endoplasmic reticulum membrane oxidoreductase with an FAD-binding domain and a flavodoxin-like domain. The protein binds two cofactors, FAD and FMN, which allow it to donate electrons directly from NADPH to all microsomal P450 enzymes. Mutations in this gene have been associated with various diseases, including apparent combined P450C17 and P450C21 deficiency, amenorrhea and disordered steroidogenesis, congenital adrenal hyperplasia and Antley-Bixler syndrome. NA
MFSD4A ENSG00000174514 148808 major facilitator superfamily domain containing 4A NA NA
RIMS3 ENSG00000117016 9783 regulating synaptic membrane exocytosis 3 NA NA
RBM11 ENSG00000185272 54033 RNA binding motif protein 11 NA NA
IGHA1 ENSG00000211895 ENSG00000211895 immunoglobulin heavy constant alpha 1 NA NA
HP ENSG00000257017 3240 haptoglobin This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. NA
DUSP23 ENSG00000158716 54935 dual specificity phosphatase 23 NA NA
AIF1L ENSG00000126878 83543 allograft inflammatory factor 1 like NA NA
TP73 ENSG00000078900 7161 tumor protein p73 This gene encodes a member of the p53 family of transcription factors involved in cellular responses to stress and development. It maps to a region on chromosome 1p36 that is frequently deleted in neuroblastoma and other tumors, and thought to contain multiple tumor suppressor genes. The demonstration that this gene is monoallelically expressed (likely from the maternal allele), supports the notion that it is a candidate gene for neuroblastoma. Many transcript variants resulting from alternative splicing and/or use of alternate promoters have been found for this gene, but the biological validity and the full-length nature of some variants have not been determined. NA
LOC101930370 ENSG00000245213 101930370 uncharacterized LOC101930370 NA NA
PKP3 ENSG00000184363 11187 plakophilin 3 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may act in cellular desmosome-dependent adhesion and signaling pathways. Two transcript variants encoding different isoforms have been found for this gene. NA
GALNT7 ENSG00000109586 51809 polypeptide N-acetylgalactosaminyltransferase 7 This gene encodes GalNAc transferase 7, a member of the GalNAc-transferase family. The enzyme encoded by this gene controls the initiation step of mucin-type O-linked protein glycosylation and transfer of N-acetylgalactosamine to serine and threonine amino acid residues. This enzyme is a type II transmembrane protein and shares common sequence motifs with other family members. Unlike other family members, this enzyme shows exclusive specificity for partially GalNAc-glycosylated acceptor substrates and shows no activity with non-glycosylated peptides. This protein may function as a follow-up enzyme in the initiation step of O-glycosylation. NA
NXPH3 ENSG00000182575 11248 neurexophilin 3 NA NA
STAR ENSG00000147465 6770 steroidogenic acute regulatory protein The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. NA
HSD11B1 ENSG00000117594 3290 hydroxysteroid 11-beta dehydrogenase 1 The protein encoded by this gene is a microsomal enzyme that catalyzes the conversion of the stress hormone cortisol to the inactive metabolite cortisone. In addition, the encoded protein can catalyze the reverse reaction, the conversion of cortisone to cortisol. Too much cortisol can lead to central obesity, and a particular variation in this gene has been associated with obesity and insulin resistance in children. Mutations in this gene and H6PD (hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase)) are the cause of cortisone reductase deficiency. Alternate splicing results in multiple transcript variants encoding the same protein. NA
PLAC8 ENSG00000145287 51316 placenta specific 8 NA NA
IRS2 ENSG00000185950 8660 insulin receptor substrate 2 This gene encodes the insulin receptor substrate 2, a cytoplasmic signaling molecule that mediates effects of insulin, insulin-like growth factor 1, and other cytokines by acting as a molecular adaptor between diverse receptor tyrosine kinases and downstream effectors. The product of this gene is phosphorylated by the insulin receptor tyrosine kinase upon receptor stimulation, as well as by an interleukin 4 receptor-associated kinase in response to IL4 treatment. NA
NA ENSG00000273281 NA NA NA TRUE
RPS3AP47 ENSG00000205871 ENSG00000205871 ribosomal protein S3a pseudogene 47 NA NA
LETM2 ENSG00000165046 137994 leucine zipper and EF-hand containing transmembrane protein 2 NA NA
H6PD ENSG00000049239 9563 hexose-6-phosphate dehydrogenase/glucose 1-dehydrogenase There are 2 forms of glucose-6-phosphate dehydrogenase. G form is X-linked and H form, encoded by this gene, is autosomally linked. This H form shows activity with other hexose-6-phosphates, especially galactose-6-phosphate, whereas the G form is specific for glucose-6-phosphate. Both forms are present in most tissues, but H form is not found in red cells. NA
ACKR1 ENSG00000213088 2532 atypical chemokine receptor 1 (Duffy blood group) The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The encoded protein is the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system. Two transcript variants encoding different isoforms have been found for this gene. NA
NA ENSG00000204807 NA NA NA TRUE
ASGR1 ENSG00000141505 432 asialoglycoprotein receptor 1 This gene encodes a subunit of the asialoglycoprotein receptor. This receptor is a transmembrane protein that plays a critical role in serum glycoprotein homeostasis by mediating the endocytosis and lysosomal degradation of glycoproteins with exposed terminal galactose or N-acetylgalactosamine residues. The asialoglycoprotein receptor may facilitate hepatic infection by multiple viruses including hepatitis B, and is also a target for liver-specific drug delivery. The asialoglycoprotein receptor is a hetero-oligomeric protein composed of major and minor subunits, which are encoded by different genes. The protein encoded by this gene is the more abundant major subunit. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
KCNN4 ENSG00000104783 3783 potassium calcium-activated channel subfamily N member 4 The protein encoded by this gene is part of a potentially heterotetrameric voltage-independent potassium channel that is activated by intracellular calcium. Activation is followed by membrane hyperpolarization, which promotes calcium influx. The encoded protein may be part of the predominant calcium-activated potassium channel in T-lymphocytes. This gene is similar to other KCNN family potassium channel genes, but it differs enough to possibly be considered as part of a new subfamily. NA
HBEGF ENSG00000113070 1839 heparin binding EGF like growth factor NA NA
MPZL3 ENSG00000160588 196264 myelin protein zero like 3 NA NA
DBNDD2 ENSG00000244274 55861 dysbindin domain containing 2 NA NA
KCNK1 ENSG00000135750 3775 potassium two pore domain channel subfamily K member 1 This gene encodes one of the members of the superfamily of potassium channel proteins containing two pore-forming P domains. The product of this gene has not been shown to be a functional channel, however, it may require other non-pore-forming proteins for activity. NA
SPINK1 ENSG00000164266 6690 serine peptidase inhibitor, Kazal type 1 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. NA
RP11-798M19.3 ENSG00000248774 ENSG00000248774 NA NA NA
RP11-799B12.2 ENSG00000264924 ENSG00000264924 NA NA NA
PRSS3 ENSG00000010438 5646 protease, serine 3 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. NA
REG1B ENSG00000172023 5968 regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
GPAM ENSG00000119927 57678 glycerol-3-phosphate acyltransferase, mitochondrial This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. NA
CELA3B ENSG00000219073 23436 chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. NA
REG1A ENSG00000115386 5967 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
LYZ ENSG00000090382 4069 lysozyme This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. NA
RBP4 ENSG00000138207 5950 retinol binding protein 4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. NA
AF001548.5 ENSG00000263335 ENSG00000263335 NA NA NA
APOC3 ENSG00000110245 345 apolipoprotein C3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. NA
RP11-50D9.1 ENSG00000244021 ENSG00000244021 NA NA NA
MUC1 ENSG00000185499 4582 mucin 1, cell surface associated This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. NA
ZNF738 ENSG00000172687 ENSG00000172687 zinc finger protein 738 NA NA
RP11-54O7.17 ENSG00000272512 ENSG00000272512 NA NA NA
WDR76 ENSG00000092470 79968 WD repeat domain 76 NA NA
JCHAIN ENSG00000132465 3512 joining chain of multimeric IgA and IgM NA NA
ATF5 ENSG00000169136 22809 activating transcription factor 5 NA NA
SOX4 ENSG00000124766 6659 SRY-box 4 This intronless gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins, such as syndecan binding protein (syntenin). The protein may function in the apoptosis pathway leading to cell death as well as to tumorigenesis and may mediate downstream effects of parathyroid hormone (PTH) and PTH-related protein (PTHrP) in bone development. The solution structure has been resolved for the HMG-box of a similar mouse protein. NA
STRADB ENSG00000082146 55437 STE20-related kinase adaptor beta This gene encodes a protein that belongs to the serine/threonine protein kinase STE20 subfamily. One of the active site residues in the protein kinase domain of this protein is altered, and it is thus a pseudokinase. This protein is a component of a complex involved in the activation of serine/threonine kinase 11, a master kinase that regulates cell polarity and energy-generating metabolism. This complex regulates the relocation of this kinase from the nucleus to the cytoplasm, and it is essential for G1 cell cycle arrest mediated by this kinase. The protein encoded by this gene can also interact with the X chromosome-linked inhibitor of apoptosis protein, and this interaction enhances the anti-apoptotic activity of this protein via the JNK1 signal transduction pathway. Two pseudogenes, located on chromosomes 1 and 7, have been found for this gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
VTN ENSG00000109072 7448 vitronectin The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. NA
THNSL2 ENSG00000144115 55258 threonine synthase like 2 This gene encodes a threonine synthase-like protein. A similar enzyme in mouse can catalyze the degradation of O-phospho-homoserine to a-ketobutyrate, phosphate, and ammonia. This protein also has phospho-lyase activity on both gamma and beta phosphorylated substrates. In mouse an alternatively spliced form of this protein has been shown to act as a cytokine and can induce the production of the inflammatory cytokine IL6 in osteoblasts. Alternate splicing results in multiple transcript variants. NA
PRSS1 ENSG00000204983 5644 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
SPECC1 ENSG00000128487 92521 sperm antigen with calponin homology and coiled-coil domains 1 The protein encoded by this gene belongs to the cytospin-A family. It is localized in the nucleus, and highly expressed in testis and some cancer cell lines. A chromosomal translocation involving this gene and platelet-derived growth factor receptor, beta gene (PDGFRB) may be a cause of juvenile myelomonocytic leukemia. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
RPL7P19 ENSG00000241458 ENSG00000241458 ribosomal protein L7 pseudogene 19 NA NA
TNNT2 ENSG00000118194 7139 troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. NA
MGST1 ENSG00000008394 4257 microsomal glutathione S-transferase 1 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. NA
SNHG12 ENSG00000197989 85028 small nucleolar RNA host gene 12 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
TSPAN15 23555 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The use of alternate polyadenylation sites has been found for this gene. ENSG00000099282 tetraspanin 15 NA
TPSAB1 7177 Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. ENSG00000172236 tryptase alpha/beta 1 NA
EDN1 1906 This gene encodes a preproprotein that is proteolytically processed to generate a secreted peptide that belongs to the endothelin/sarafotoxin family. This peptide is a potent vasoconstrictor and its cognate receptors are therapeutic targets in the treatment of pulmonary arterial hypertension. Aberrant expression of this gene may promote tumorigenesis. Alternative splicing results in multiple transcript variants. ENSG00000078401 endothelin 1 NA
MGAT3 4248 There are believed to be over 100 different glycosyltransferases involved in the synthesis of protein-bound and lipid-bound oligosaccharides. The enzyme encoded by this gene transfers a GlcNAc residue to the beta-linked mannose of the trimannosyl core of N-linked oligosaccharides and produces a bisecting GlcNAc. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000128268 mannosyl (beta-1,4-)-glycoprotein beta-1,4-N-acetylglucosaminyltransferase NA
RNF157 114804 NA ENSG00000141576 ring finger protein 157 NA
NOV 4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. ENSG00000136999 nephroblastoma overexpressed NA
STC2 8614 This gene encodes a secreted, homodimeric glycoprotein that is expressed in a wide variety of tissues and may have autocrine or paracrine functions. The encoded protein has 10 of its 15 cysteine residues conserved among stanniocalcin family members and is phosphorylated by casein kinase 2 exclusively on its serine residues. Its C-terminus contains a cluster of histidine residues which may interact with metal ions. The protein may play a role in the regulation of renal and intestinal calcium and phosphate transport, cell metabolism, or cellular calcium/phosphate homeostasis. Constitutive overexpression of human stanniocalcin 2 in mice resulted in pre- and postnatal growth restriction, reduced bone and skeletal muscle growth, and organomegaly. Expression of this gene is induced by estrogen and altered in some breast cancers. ENSG00000113739 stanniocalcin 2 NA
FAM107A 11170 NA ENSG00000168309 family with sequence similarity 107 member A NA
CYP17A1 1586 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. ENSG00000148795 cytochrome P450 family 17 subfamily A member 1 NA
ADGRD1 283383 The adhesion G-protein-coupled receptors (GPCRs), including GPR133, are membrane-bound proteins with long N termini containing multiple domains. GPCRs, or GPRs, contain 7 transmembrane domains and transduce extracellular signals through heterotrimeric G proteins (summary by Bjarnadottir et al., 2004 [PubMed 15203201]). ENSG00000111452 adhesion G protein-coupled receptor D1 NA
SLCO4A1-AS1 100127888 NA ENSG00000232803 SLCO4A1 antisense RNA 1 NA
SLCO4A1 28231 NA ENSG00000101187 solute carrier organic anion transporter family member 4A1 NA
FABP4 2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. ENSG00000170323 fatty acid binding protein 4 NA
CTD-2562J17.7 ENSG00000254429 NA ENSG00000254429 NA NA
RP11-286H15.1 ENSG00000272789 NA ENSG00000272789 NA NA
AGAP2 116986 The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000135439 ArfGAP with GTPase domain, ankyrin repeat and PH domain 2 NA
CTB-13F3.1 ENSG00000273055 NA ENSG00000273055 NA NA
STXBP6 29091 STXBP6 binds components of the SNARE complex (see MIM 603215) and may be involved in regulating SNARE complex formation (Scales et al., 2002 [PubMed 12145319]). ENSG00000168952 syntaxin binding protein 6 NA
COX4I2 84701 Cytochrome c oxidase (COX), the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes isoform 2 of subunit IV. Isoform 1 of subunit IV is encoded by a different gene, however, the two genes show a similar structural organization. Subunit IV is the largest nuclear encoded subunit which plays a pivotal role in COX regulation. ENSG00000131055 cytochrome c oxidase subunit 4I2 NA
RP11-449J21.5 ENSG00000267128 NA ENSG00000267128 NA NA
RP11-114N19.3 ENSG00000258999 NA ENSG00000258999 NA NA
HIGD1B 51751 This gene encodes a member of the hypoxia inducible gene 1 (HIG1) domain family. The encoded protein is localized to the cell membrane and has been linked to tumorigenesis and the progression of pituitary adenomas. Alternative splicing results in multiple transcript variants. ENSG00000131097 HIG1 hypoxia inducible domain family member 1B NA
MFSD4A 148808 NA ENSG00000174514 major facilitator superfamily domain containing 4A NA
CIDEC 63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. ENSG00000187288 cell death inducing DFFA like effector c NA
ITIH4-AS1 100873993 NA ENSG00000239799 ITIH4 antisense RNA 1 NA
ASRGL1 80150 NA ENSG00000162174 asparaginase like 1 NA
ITGB4 3691 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. ENSG00000132470 integrin subunit beta 4 NA
NA NA NA ENSG00000272016 NA TRUE
COL18A1 80781 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. ENSG00000182871 collagen type XVIII alpha 1 chain NA
CTD-2531D15.5 ENSG00000255126 NA ENSG00000255126 NA NA
SYCP2L 221711 NA ENSG00000153157 synaptonemal complex protein 2 like NA
RP11-109D20.2 ENSG00000259352 NA ENSG00000259352 NA NA
PDGFB 5155 This gene encodes a member of the protein family comprised of both platelet-derived growth factors (PDGF) and vascular endothelial growth factors (VEGF). The encoded preproprotein is proteolytically processed to generate platelet-derived growth factor subunit B, which can homodimerize, or alternatively, heterodimerize with the related platelet-derived growth factor subunit A. These proteins bind and activate PDGF receptor tyrosine kinases, which play a role in a wide range of developmental processes. Mutations in this gene are associated with meningioma. Reciprocal translocations between chromosomes 22 and 17, at sites where this gene and that for collagen type 1, alpha 1 are located, are associated with dermatofibrosarcoma protuberans, a rare skin tumor. Alternative splicing results in multiple transcript variants. ENSG00000100311 platelet derived growth factor subunit B NA
RP11-703H8.7 ENSG00000255118 NA ENSG00000255118 NA NA
PPARGC1B 133522 The protein encoded by this gene stimulates the activity of several transcription factors and nuclear receptors, including estrogen receptor alpha, nuclear respiratory factor 1, and glucocorticoid receptor. The encoded protein may be involved in fat oxidation, non-oxidative glucose metabolism, and the regulation of energy expenditure. This protein is downregulated in prediabetic and type 2 diabetes mellitus patients. Certain allelic variations in this gene increase the risk of the development of obesity. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000155846 PPARG coactivator 1 beta NA
USP35 57558 NA ENSG00000118369 ubiquitin specific peptidase 35 NA
GPAT2 150763 NA ENSG00000186281 glycerol-3-phosphate acyltransferase 2, mitochondrial NA
SORD2P ENSG00000259479 NA ENSG00000259479 sorbitol dehydrogenase 2, pseudogene NA
CILP 8483 Major alterations in the composition of the cartilage extracellular matrix occur in joint disease, such as osteoarthrosis. This gene encodes the cartilage intermediate layer protein (CILP), which increases in early osteoarthrosis cartilage. The encoded protein was thought to encode a protein precursor for two different proteins; an N-terminal CILP and a C-terminal homolog of NTPPHase, however, later studies identified no nucleotide pyrophosphatase phosphodiesterase (NPP) activity. The full-length and the N-terminal domain of this protein was shown to function as an IGF-1 antagonist. An allelic variant of this gene has been associated with lumbar disc disease. ENSG00000138615 cartilage intermediate layer protein NA
PPP1R1B 84152 This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000131771 protein phosphatase 1 regulatory inhibitor subunit 1B NA
ANO7P1 ENSG00000237276 NA ENSG00000237276 anoctamin 7 pseudogene 1 NA
CELF2 10659 Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000048740 CUGBP, Elav-like family member 2 NA
RP11-16E12.2 ENSG00000259772 NA ENSG00000259772 NA NA
CYP11B1 1584 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. ENSG00000160882 cytochrome P450 family 11 subfamily B member 1 NA
RP11-845C23.3 ENSG00000267396 NA ENSG00000267396 NA NA
KRT16 3868 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region of chromosome 17q12-q21. This keratin has been coexpressed with keratin 14 in a number of epithelial tissues, including esophagus, tongue, and hair follicles. Mutations in this gene are associated with type 1 pachyonychia congenita, non-epidermolytic palmoplantar keratoderma and unilateral palmoplantar verrucous nevus. ENSG00000186832 keratin 16 NA
FAM101B 359845 NA ENSG00000183688 family with sequence similarity 101 member B NA
PDE3B 5140 NA ENSG00000152270 phosphodiesterase 3B NA
PANX2 56666 The protein encoded by this gene belongs to the innexin family. Innexin family members are the structural components of gap junctions. This protein and pannexin 1 are abundantly expressed in central nervous system (CNS) and are coexpressed in various neuronal populations. Studies in Xenopus oocytes suggest that this protein alone and in combination with pannexin 1 may form cell type-specific gap junctions with distinct properties. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000073150 pannexin 2 NA
NKD2 85409 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000145506 naked cuticle homolog 2 NA
IL18R1 8809 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This receptor specifically binds interleukin 18 (IL18), and is essential for IL18 mediated signal transduction. IFN-alpha and IL12 are reported to induce the expression of this receptor in NK and T cells. This gene along with four other members of the interleukin 1 receptor family, including IL1R2, IL1R1, ILRL2 (IL-1Rrp2), and IL1RL1 (T1/ST2), form a gene cluster on chromosome 2q. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000115604 interleukin 18 receptor 1 NA
ANKRD65 441869 NA ENSG00000235098 ankyrin repeat domain 65 NA
MTND1P23 ENSG00000225972 NA ENSG00000225972 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 NA
SRD5A1 6715 Steroid 5-alpha-reductase (EC 1.3.99.5) catalyzes the conversion of testosterone into the more potent androgen, dihydrotestosterone (DHT). Also see SRD5A2 (MIM 607306). ENSG00000145545 steroid 5 alpha-reductase 1 NA
TNFAIP6 7130 The protein encoded by this gene is a secretory protein that contains a hyaluronan-binding domain, and thus is a member of the hyaluronan-binding protein family. The hyaluronan-binding domain is known to be involved in extracellular matrix stability and cell migration. This protein has been shown to form a stable complex with inter-alpha-inhibitor (I alpha I), and thus enhance the serine protease inhibitory activity of I alpha I, which is important in the protease network associated with inflammation. This gene can be induced by proinflammatory cytokines such as tumor necrosis factor alpha and interleukin-1. Enhanced levels of this protein are found in the synovial fluid of patients with osteoarthritis and rheumatoid arthritis. ENSG00000123610 TNF alpha induced protein 6 NA
RP11-348P10.2 ENSG00000272077 NA ENSG00000272077 NA NA
VWF 7450 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. ENSG00000110799 von Willebrand factor NA
SFTPA2 729238 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. ENSG00000185303 surfactant protein A2 NA
ADORA1 134 The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene. ENSG00000163485 adenosine A1 receptor NA
HEY1 23462 This gene encodes a nuclear protein belonging to the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcriptional repressors. Expression of this gene is induced by the Notch and c-Jun signal transduction pathways. Two similar and redundant genes in mouse are required for embryonic cardiovascular development, and are also implicated in neurogenesis and somitogenesis. Alternative splicing results in multiple transcript variants. ENSG00000164683 hes related family bHLH transcription factor with YRPW motif 1 NA
KBTBD8 84541 NA ENSG00000163376 kelch repeat and BTB domain containing 8 NA
PIK3R3 8503 NA ENSG00000117461 phosphoinositide-3-kinase regulatory subunit 3 NA
SLC16A9 220963 NA ENSG00000165449 solute carrier family 16 member 9 NA
NA NA NA ENSG00000257499 NA TRUE
SORD 6652 Sorbitol dehydrogenase (SORD; EC 1.1.1.14) catalyzes the interconversion of polyols and their corresponding ketoses, and together with aldose reductase (ALDR1; MIM 103880), makes up the sorbitol pathway that is believed to play an important role in the development of diabetic complications (summarized by Carr and Markham, 1995 [PubMed 8535074]). The first reaction of the pathway (also called the polyol pathway) is the reduction of glucose to sorbitol by ALDR1 with NADPH as the cofactor. SORD then oxidizes the sorbitol to fructose using NAD(+) cofactor. ENSG00000140263 sorbitol dehydrogenase NA
LOC400684 400684 NA ENSG00000267213 uncharacterized LOC400684 NA
LIPE 3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. ENSG00000079435 lipase E, hormone sensitive type NA
CHD7 55636 This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000171316 chromodomain helicase DNA binding protein 7 NA
SLC40A1 30061 The protein encoded by this gene is a cell membrane protein that may be involved in iron export from duodenal epithelial cells. Defects in this gene are a cause of hemochromatosis type 4 (HFE4). ENSG00000138449 solute carrier family 40 member 1 NA
RP1-95L4.4 ENSG00000217648 NA ENSG00000217648 NA NA
TPD52L1 7164 This gene encodes a member of a family of proteins that contain coiled-coil domains and may form hetero- or homomers. The encoded protein is involved in cell proliferation and calcium signaling. It also interacts with the mitogen-activated protein kinase kinase kinase 5 (MAP3K5/ASK1) and positively regulates MAP3K5-induced apoptosis. Multiple alternatively spliced transcript variants have been observed. ENSG00000111907 tumor protein D52-like 1 NA
RP11-497H17.1 ENSG00000262663 NA ENSG00000262663 NA NA
FAR2 55711 This gene belongs to the short chain dehydrogenase/reductase superfamily. It encodes a reductase enzyme involved in the first step of wax biosynthesis wherein fatty acids are converted to fatty alcohols. The encoded peroxisomal protein utilizes saturated fatty acids of 16 or 18 carbons as preferred substrates. Alternatively spliced transcript variants have been observed for this gene. Related pseudogenes have been identified on chromosomes 2, 14 and 22. ENSG00000064763 fatty acyl-CoA reductase 2 NA
HIBCH 26275 This gene encodes the enzyme responsible for hydrolysis of both HIBYL-CoA and beta-hydroxypropionyl-CoA. Mutations in this gene have been associated with 3-hyroxyisobutyryl-CoA hydrolase deficiency. Alternative splicing results in multiple transcript variants. ENSG00000198130 3-hydroxyisobutyryl-CoA hydrolase NA
MTFR2 113115 NA ENSG00000146410 mitochondrial fission regulator 2 NA
PNMAL1 55228 NA ENSG00000182013 paraneoplastic Ma antigen family-like 1 NA
NA NA NA ENSG00000175898 NA TRUE
SLC12A2 6558 The protein encoded by this gene mediates sodium and chloride transport and reabsorption. The encoded protein is a membrane protein and is important in maintaining proper ionic balance and cell volume. This protein is phosphorylated in response to DNA damage. Three transcript variants encoding two different isoforms have been found for this gene. ENSG00000064651 solute carrier family 12 member 2 NA
NIPSNAP1 8508 This gene encodes a member of the NipSnap family of proteins that may be involved in vesicular transport. A similar protein in mice inhibits the calcium channel TRPV6, and is also localized to the inner mitochondrial membrane where it may play a role in mitochondrial DNA maintenance. A pseudogene of this gene is located on the short arm of chromosome 17. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000184117 nipsnap homolog 1 (C. elegans) NA
TNFRSF21 27242 This gene encodes a member of the tumor necrosis factor receptor superfamily. The encoded protein activates nuclear factor kappa-B and mitogen-activated protein kinase 8 (also called c-Jun N-terminal kinase 1), and induces cell apoptosis. Through its death domain, the encoded receptor interacts with tumor necrosis factor receptor type 1-associated death domain (TRADD) protein, which is known to mediate signal transduction of tumor necrosis factor receptors. Knockout studies in mice suggest that this gene plays a role in T-helper cell activation, and may be involved in inflammation and immune regulation. ENSG00000146072 tumor necrosis factor receptor superfamily member 21 NA
RP11-190A12.8 ENSG00000272668 NA ENSG00000272668 NA NA
RP11-253E3.3 ENSG00000250899 NA ENSG00000250899 NA NA
TMEM14A 28978 NA ENSG00000096092 transmembrane protein 14A NA
TMEM25 84866 NA ENSG00000149582 transmembrane protein 25 NA
NEDD4L 23327 This gene encodes a member of the Nedd4 family of HECT domain E3 ubiquitin ligases. HECT domain E3 ubiquitin ligases transfer ubiquitin from E2 ubiquitin-conjugating enzymes to protein substrates, thus targeting specific proteins for lysosomal degradation. The encoded protein mediates the ubiquitination of multiple target substrates and plays a critical role in epithelial sodium transport by regulating the cell surface expression of the epithelial sodium channel, ENaC. Single nucleotide polymorphisms in this gene may be associated with essential hypertension. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000049759 neural precursor cell expressed, developmentally down-regulated 4-like, E3 ubiquitin protein ligase NA
HIST1H2BD 3017 Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H2B family. Two transcripts that encode the same protein have been identified for this gene, which is found in the large histone gene cluster on chromosome 6p22-p21.3. ENSG00000158373 histone cluster 1, H2bd NA
RP11-335I12.2 ENSG00000256072 NA ENSG00000256072 NA NA
CADM3 57863 IGSF4B is a brain-specific protein related to the calcium-independent cell-cell adhesion molecules known as nectins (see PVRL3; MIM 607147) (Kakunaga et al., 2005 [PubMed 15741237]). ENSG00000162706 cell adhesion molecule 3 NA
RP11-54F2.1 ENSG00000251196 NA ENSG00000251196 NA NA
THBS4 7060 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. ENSG00000113296 thrombospondin 4 NA
RGMB 285704 RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). ENSG00000174136 repulsive guidance molecule family member b NA
PCBP3 54039 This gene encodes a member of the KH-domain protein subfamily. Proteins of this subfamily, also referred to as alpha-CPs, bind to RNA with a specificity for C-rich pyrimidine regions. Alpha-CPs play important roles in post-transcriptional activities and have different cellular distributions. This gene’s protein is found in the cytoplasm, yet it lacks the nuclear localization signals found in other subfamily members. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000183570 poly(rC) binding protein 3 NA
RP11-16P6.1 ENSG00000261428 NA ENSG00000261428 NA NA
CLDN1 9076 Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. ENSG00000163347 claudin 1 NA
KCTD21-AS1 100289388 NA ENSG00000246174 KCTD21 antisense RNA 1 NA
RP11-356B19.11 ENSG00000271833 NA ENSG00000271833 NA NA
LTK 4058 The protein encoded by this gene is a member of the ros/insulin receptor family of tyrosine kinases. Tyrosine-specific phosphorylation of proteins is a key to the control of diverse pathways leading to cell growth and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000062524 leukocyte receptor tyrosine kinase NA
FAM134B 54463 The protein encoded by this gene is a cis-Golgi transmembrane protein that may be necessary for the long-term survival of nociceptive and autonomic ganglion neurons. Mutations in this gene are a cause of hereditary sensory and autonomic neuropathy type IIB (HSAN IIB), and this gene may also play a role in susceptibility to vascular dementia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000154153 family with sequence similarity 134 member B NA
IL12A 3592 This gene encodes a subunit of a cytokine that acts on T and natural killer cells, and has a broad array of biological activities. The cytokine is a disulfide-linked heterodimer composed of the 35-kD subunit encoded by this gene, and a 40-kD subunit that is a member of the cytokine receptor family. This cytokine is required for the T-cell-independent induction of interferon (IFN)-gamma, and is important for the differentiation of both Th1 and Th2 cells. The responses of lymphocytes to this cytokine are mediated by the activator of transcription protein STAT4. Nitric oxide synthase 2A (NOS2A/NOS2) is found to be required for the signaling process of this cytokine in innate immunity. ENSG00000168811 interleukin 12A NA
FBXO16 157574 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbx class. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000214050 F-box protein 16 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name symbol summary notfound
ENSG00000180209 29895 myosin light chain, phosphorylatable, fast skeletal muscle MYLPF NA NA
ENSG00000130598 7136 troponin I2, fast skeletal type TNNI2 This gene encodes a fast-twitch skeletal muscle protein, a member of the troponin I gene family, and a component of the troponin complex including troponin T, troponin C and troponin I subunits. The troponin complex, along with tropomyosin, is responsible for the calcium-dependent regulation of striated muscle contraction. Mouse studies show that this component is also present in vascular smooth muscle and may play a role in regulation of smooth muscle function. In addition to muscle tissues, this protein is found in corneal epithelium, cartilage where it is an inhibitor of angiogenesis to inhibit tumor growth and metastasis, and mammary gland where it functions as a co-activator of estrogen receptor-related receptor alpha. This protein also suppresses tumor growth in human ovarian carcinoma. Mutations in this gene cause myopathy and distal arthrogryposis type 2B. Alternatively spliced transcript variants have been found for this gene. NA
ENSG00000109061 4619 myosin, heavy chain 1, skeletal muscle, adult MYH1 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. NA
ENSG00000182676 116729 protein phosphatase 1 regulatory subunit 27 PPP1R27 NA NA
ENSG00000185267 441549 cerebral dopamine neurotrophic factor CDNF NA NA
ENSG00000168530 4632 myosin light chain 1 MYL1 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. NA
ENSG00000260500 ENSG00000260500 NA CTD-3193O13.1 NA NA
ENSG00000260001 100507588 transforming growth factor beta receptor 3 like TGFBR3L NA NA
ENSG00000135074 8728 ADAM metallopeptidase domain 19 ADAM19 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. This member is a type I transmembrane protein and serves as a marker for dendritic cell differentiation. It has been demonstrated to be an active metalloproteinase, which may be involved in normal physiological processes such as cell migration, cell adhesion, cell-cell and cell-matrix interactions, and signal transduction. It is proposed to play a role in pathological processes, such as cancer, inflammatory diseases, renal diseases, and Alzheimer’s disease. NA
ENSG00000086967 4606 myosin binding protein C, fast type MYBPC2 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. NA
ENSG00000196296 487 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1 ATP2A1 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. NA
ENSG00000101447 81610 family with sequence similarity 83 member D FAM83D NA NA
ENSG00000253549 100996348 CA3 antisense RNA 1 CA3-AS1 NA NA
ENSG00000240045 100507537 uncharacterized LOC100507537 LOC100507537 NA NA
ENSG00000164879 761 carbonic anhydrase 3 CA3 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. NA
ENSG00000183091 4703 nebulin NEB This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. NA
ENSG00000165995 783 calcium voltage-gated channel auxiliary subunit beta 2 CACNB2 This gene encodes a subunit of a voltage-dependent calcium channel protein that is a member of the voltage-gated calcium channel superfamily. The gene product was originally identified as an antigen target in Lambert-Eaton myasthenic syndrome, an autoimmune disorder. Mutations in this gene are associated with Brugada syndrome. Alternatively spliced variants encoding different isoforms have been described. NA
ENSG00000101470 7125 troponin C2, fast skeletal type TNNC2 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. NA
ENSG00000196091 4604 myosin binding protein C, slow type MYBPC1 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
ENSG00000072110 87 actinin alpha 1 ACTN1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000058668 493 ATPase plasma membrane Ca2+ transporting 4 ATP2B4 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
ENSG00000159173 7135 troponin I1, slow skeletal type TNNI1 Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. NA
ENSG00000185482 246329 SH3 and cysteine rich domain 3 STAC3 The protein encoded by this gene is a component of the excitation-contraction coupling machinery of muscles. This protein is a member of the Stac gene family and contains an N-terminal cysteine-rich domain and two SH3 domains. Mutations in this gene are a cause of Native American myopathy. NA
ENSG00000125414 4620 myosin, heavy chain 2, skeletal muscle, adult MYH2 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
ENSG00000250510 27239 G protein-coupled receptor 162 GPR162 This gene was identified upon genomic analysis of a gene-dense region at human chromosome 12p13. It appears to be mainly expressed in the brain; however, its function is not known. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
ENSG00000257261 ENSG00000257261 NA RP11-96H19.1 NA NA
ENSG00000235475 101929736 long intergenic non-protein coding RNA 1372 LINC01372 NA NA
ENSG00000198732 64093 SPARC related modular calcium binding 1 SMOC1 This gene encodes a multi-domain secreted protein that may have a critical role in ocular and limb development. Mutations in this gene are associated with microphthalmia and limb anomalies. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000170421 3856 keratin 8 KRT8 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. NA
ENSG00000158246 115572 family with sequence similarity 46 member B FAM46B NA NA
ENSG00000232220 ENSG00000232220 NA AC008440.5 NA NA
ENSG00000179820 91663 myeloid-associated differentiation marker MYADM NA NA
ENSG00000172346 27254 cold shock domain containing C2 CSDC2 NA NA
ENSG00000109107 230 aldolase, fructose-bisphosphate C ALDOC This gene encodes a member of the class I fructose-biphosphate aldolase gene family. Expressed specifically in the hippocampus and Purkinje cells of the brain, the encoded protein is a glycolytic enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-biphosphate and fructose 1-phosphate to dihydroxyacetone phosphate and either glyceraldehyde-3-phosphate or glyceraldehyde, respectively. NA
ENSG00000183963 6525 smoothelin SMTN This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. NA
ENSG00000196218 6261 ryanodine receptor 1 RYR1 This gene encodes a ryanodine receptor found in skeletal muscle. The encoded protein functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule. Mutations in this gene are associated with malignant hyperthermia susceptibility, central core disease, and minicore myopathy with external ophthalmoplegia. Alternatively spliced transcripts encoding different isoforms have been described. NA
ENSG00000169885 163688 calmodulin like 6 CALML6 NA NA
ENSG00000159840 7791 zyxin ZYX Focal adhesions are actin-rich structures that enable cells to adhere to the extracellular matrix and at which protein complexes involved in signal transduction assemble. Zyxin is a zinc-binding phosphoprotein that concentrates at focal adhesions and along the actin cytoskeleton. Zyxin has an N-terminal proline-rich domain and three LIM domains in its C-terminal half. The proline-rich domain may interact with SH3 domains of proteins involved in signal transduction pathways while the LIM domains are likely involved in protein-protein binding. Zyxin may function as a messenger in the signal transduction pathway that mediates adhesion-stimulated changes in gene expression and may modulate the cytoskeletal organization of actin bundles. Alternative splicing results in multiple transcript variants that encode the same isoform. NA
ENSG00000256271 100874235 CACNA1C antisense RNA 2 CACNA1C-AS2 NA NA
ENSG00000134247 5738 prostaglandin F2 receptor inhibitor PTGFRN NA NA
ENSG00000120549 56243 KIAA1217 KIAA1217 NA NA
ENSG00000127472 5322 phospholipase A2 group V PLA2G5 This gene is a member of the secretory phospholipase A2 family. It is located in a tightly-linked cluster of secretory phospholipase A2 genes on chromosome 1. The encoded enzyme catalyzes the hydrolysis of membrane phospholipids to generate lysophospholipids and free fatty acids including arachidonic acid. It preferentially hydrolyzes linoleoyl-containing phosphatidylcholine substrates. Secretion of this enzyme is thought to induce inflammatory responses in neighboring cells. Alternatively spliced transcript variants have been found, but their full-length nature has not been determined. NA
ENSG00000189058 347 apolipoprotein D APOD This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. NA
ENSG00000232504 ENSG00000232504 ST3GAL5 antisense RNA 1 (head to head) ST3GAL5-AS1 NA NA
ENSG00000197380 147906 dishevelled binding antagonist of beta catenin 3 DACT3 NA NA
ENSG00000117115 11240 peptidyl arginine deiminase 2 PADI2 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. NA
ENSG00000100994 5834 phosphorylase, glycogen; brain PYGB The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. NA
ENSG00000125503 54776 protein phosphatase 1 regulatory subunit 12C PPP1R12C The gene encodes a subunit of myosin phosphatase. The encoded protein regulates the catalytic activity of protein phosphatase 1 delta and assembly of the actin cytoskeleton. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
ENSG00000177791 58529 myozenin 1 MYOZ1 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. NA
ENSG00000127561 9143 synaptogyrin 3 SYNGR3 This gene encodes an integral membrane protein. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it is a synaptic vesicle protein that also interacts with the dopamine transporter. The gene product belongs to the synaptogyrin gene family. NA
ENSG00000108984 5608 mitogen-activated protein kinase kinase 6 MAP2K6 This gene encodes a member of the dual specificity protein kinase family, which functions as a mitogen-activated protein (MAP) kinase kinase. MAP kinases, also known as extracellular signal-regulated kinases (ERKs), act as an integration point for multiple biochemical signals. This protein phosphorylates and activates p38 MAP kinase in response to inflammatory cytokines or environmental stress. As an essential component of p38 MAP kinase mediated signal transduction pathway, this gene is involved in many cellular processes such as stress induced cell cycle arrest, transcription activation and apoptosis. NA
ENSG00000253250 100127983 chromosome 8 open reading frame 88 C8orf88 NA NA
ENSG00000249863 ENSG00000249863 NA RP11-177C12.1 NA NA
ENSG00000095303 5742 prostaglandin-endoperoxide synthase 1 PTGS1 This is one of two genes encoding similar enzymes that catalyze the conversion of arachinodate to prostaglandin. The encoded protein regulates angiogenesis in endothelial cells, and is inhibited by nonsteroidal anti-inflammatory drugs such as aspirin. Based on its ability to function as both a cyclooxygenase and as a peroxidase, the encoded protein has been identified as a moonlighting protein. The protein may promote cell proliferation during tumor progression. Alternative splicing results in multiple transcript variants. NA
ENSG00000259716 NA NA NA NA TRUE
ENSG00000268707 ENSG00000268707 NA RP11-247A12.7 NA NA
ENSG00000197768 441476 sperm-tail PG-rich repeat containing 3 STPG3 NA NA
ENSG00000166123 84706 glutamic pyruvate transaminase (alanine aminotransferase) 2 GPT2 This gene encodes a mitochondrial alanine transaminase, a pyridoxal enzyme that catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate. Alanine transaminases play roles in gluconeogenesis and amino acid metabolism in many tissues including skeletal muscle, kidney, and liver. Activating transcription factor 4 upregulates this gene under metabolic stress conditions in hepatocyte cell lines. A loss of function mutation in this gene has been associated with developmental encephalopathy. Alternative splicing results in multiple transcript variants. NA
ENSG00000224424 100506637 PRKAR2A antisense RNA 1 PRKAR2A-AS1 NA NA
ENSG00000170558 1000 cadherin 2 CDH2 This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. NA
ENSG00000184524 51286 cell cycle exit and neuronal differentiation 1 CEND1 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. NA
ENSG00000203943 148418 sterile alpha motif domain containing 13 SAMD13 NA NA
ENSG00000109072 7448 vitronectin VTN The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. NA
ENSG00000101335 10398 myosin light chain 9 MYL9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000205913 100128788 SRRM2 antisense RNA 1 SRRM2-AS1 NA NA
ENSG00000272735 ENSG00000272735 NA RP11-467P9.1 NA NA
ENSG00000165449 220963 solute carrier family 16 member 9 SLC16A9 NA NA
ENSG00000186111 23396 phosphatidylinositol-4-phosphate 5-kinase type 1 gamma PIP5K1C This locus encodes a type I phosphatidylinositol 4-phosphate 5-kinase. The encoded protein catalyzes phosphorylation of phosphatidylinositol 4-phosphate, producing phosphatidylinositol 4,5-bisphosphate. This enzyme is found at synapses and has been found to play roles in endocytosis and cell migration. Mutations at this locus have been associated with lethal congenital contractural syndrome. Alternatively spliced transcript variants encoding different isoforms have been described. NA
ENSG00000217648 ENSG00000217648 NA RP1-95L4.4 NA NA
ENSG00000129204 9098 ubiquitin specific peptidase 6 USP6 NA NA
ENSG00000124406 10396 ATPase phospholipid transporting 8A1 ATP8A1 The P-type adenosinetriphosphatases (P-type ATPases) are a family of proteins which use the free energy of ATP hydrolysis to drive uphill transport of ions across membranes. Several subfamilies of P-type ATPases have been identified. One subfamily catalyzes transport of heavy metal ions. Another subfamily transports non-heavy metal ions (NMHI). The protein encoded by this gene is a member of the third subfamily of P-type ATPases and acts to transport amphipaths, such as phosphatidylserine. Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000125753 7408 vasodilator-stimulated phosphoprotein VASP Vasodilator-stimulated phosphoprotein (VASP) is a member of the Ena-VASP protein family. Ena-VASP family members contain an EHV1 N-terminal domain that binds proteins containing E/DFPPPPXD/E motifs and targets Ena-VASP proteins to focal adhesions. In the mid-region of the protein, family members have a proline-rich domain that binds SH3 and WW domain-containing proteins. Their C-terminal EVH2 domain mediates tetramerization and binds both G and F actin. VASP is associated with filamentous actin formation and likely plays a widespread role in cell adhesion and motility. VASP may also be involved in the intracellular signaling pathways that regulate integrin-extracellular matrix interactions. VASP is regulated by the cyclic nucleotide-dependent kinases PKA and PKG. NA
ENSG00000116132 5396 paired related homeobox 1 PRRX1 The DNA-associated protein encoded by this gene is a member of the paired family of homeobox proteins localized to the nucleus. The protein functions as a transcription co-activator, enhancing the DNA-binding activity of serum response factor, a protein required for the induction of genes by growth and differentiation factors. The protein regulates muscle creatine kinase, indicating a role in the establishment of diverse mesodermal muscle types. Alternative splicing yields two isoforms that differ in abundance and expression patterns. NA
ENSG00000159251 70 actin, alpha, cardiac muscle 1 ACTC1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). NA
ENSG00000155846 133522 PPARG coactivator 1 beta PPARGC1B The protein encoded by this gene stimulates the activity of several transcription factors and nuclear receptors, including estrogen receptor alpha, nuclear respiratory factor 1, and glucocorticoid receptor. The encoded protein may be involved in fat oxidation, non-oxidative glucose metabolism, and the regulation of energy expenditure. This protein is downregulated in prediabetic and type 2 diabetes mellitus patients. Certain allelic variations in this gene increase the risk of the development of obesity. Three transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000188191 5575 protein kinase cAMP-dependent type I regulatory subunit beta PRKAR1B The protein encoded by this gene is a regulatory subunit of cyclic AMP-dependent protein kinase A (PKA), which is involved in the signaling pathway of the second messenger cAMP. Two regulatory and two catalytic subunits form the PKA holoenzyme, disbands after cAMP binding. The holoenzyme is involved in many cellular events, including ion transport, metabolism, and transcription. Several transcript variants encoding the same protein have been found for this gene. NA
ENSG00000112658 6722 serum response factor SRF This gene encodes a ubiquitous nuclear protein that stimulates both cell proliferation and differentiation. It is a member of the MADS (MCM1, Agamous, Deficiens, and SRF) box superfamily of transcription factors. This protein binds to the serum response element (SRE) in the promoter region of target genes. This protein regulates the activity of many immediate-early genes, for example c-fos, and thereby participates in cell cycle regulation, apoptosis, cell growth, and cell differentiation. This gene is the downstream target of many pathways; for example, the mitogen-activated protein kinase pathway (MAPK) that acts through the ternary complex factors (TCFs). Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000114200 590 butyrylcholinesterase BCHE Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. NA
ENSG00000186076 ENSG00000186076 NA RP11-887P2.3 NA NA
ENSG00000234807 ENSG00000234807 long intergenic non-protein coding RNA 1135 LINC01135 NA NA
ENSG00000142227 2014 epithelial membrane protein 3 EMP3 The protein encoded by this gene belongs to the PMP-22/EMP/MP20 family of proteins. The protein contains four transmembrane domains and two N-linked glycosylation sites. It is thought to be involved in cell proliferation, cell-cell interactions and function as a tumor suppressor. Alternative splicing results in multiple transcript variants. NA
ENSG00000162433 205 adenylate kinase 4 AK4 This gene encodes a member of the adenylate kinase family of enzymes. The encoded protein is localized to the mitochondrial matrix. Adenylate kinases regulate the adenine and guanine nucleotide compositions within a cell by catalyzing the reversible transfer of phosphate group among these nucleotides. Five isozymes of adenylate kinase have been identified in vertebrates. Expression of these isozymes is tissue-specific and developmentally regulated. A pseudogene for this gene has been located on chromosome 17. Three transcript variants encoding the same protein have been identified for this gene. Sequence alignment suggests that the gene defined by NM_013410, NM_203464, and NM_001005353 is located on chromosome 1. NA
ENSG00000163322 84142 family with sequence similarity 175 member A FAM175A NA NA
ENSG00000163126 200539 ankyrin repeat domain 23 ANKRD23 This gene is a member of the muscle ankyrin repeat protein (MARP) family and encodes a protein with four tandem ankyrin-like repeats. The protein is localized to the nucleus, functioning as a transcriptional regulator. Expression of this protein is induced during recovery following starvation. NA
ENSG00000137831 55075 uveal autoantigen with coiled-coil domains and ankyrin repeats UACA NA NA
ENSG00000171314 5223 phosphoglycerate mutase 1 PGAM1 The protein encoded by this gene is a mutase that catalyzes the reversible reaction of 3-phosphoglycerate (3-PGA) to 2-phosphoglycerate (2-PGA) in the glycolytic pathway. Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000243926 ENSG00000243926 TIPARP antisense RNA 1 TIPARP-AS1 NA NA
ENSG00000115112 29842 transcription factor CP2-like 1 TFCP2L1 NA NA
ENSG00000172361 220136 cilia and flagella associated protein 53 CFAP53 This gene belongs to the CFAP53 family. It was found to be differentially expressed by the ciliated cells of frog epidermis and in skin fibroblasts from human. Mutations in this gene are associated with visceral heterotaxy-6, which implicates this gene in determination of left-right asymmetric patterning. NA
ENSG00000144218 3899 AF4/FMR2 family member 3 AFF3 This gene encodes a tissue-restricted nuclear transcriptional activator that is preferentially expressed in lymphoid tissue. Isolation of this protein initially defined a highly conserved LAF4/MLLT2 gene family of nuclear transcription factors that may function in lymphoid development and oncogenesis. In some ALL patients, this gene has been found fused to the gene for MLL. Multiple alternatively spliced transcript variants that encode different proteins have been found for this gene. NA
ENSG00000245864 ENSG00000245864 NA CTC-467M3.1 NA NA
ENSG00000106631 58498 myosin light chain 7 MYL7 NA NA
ENSG00000178821 339456 transmembrane protein 52 TMEM52 NA NA
ENSG00000182310 147650 sperm acrosome associated 6 SPACA6 NA NA
ENSG00000205090 339453 transmembrane protein 240 TMEM240 This gene encodes a transmembrane-domain containing protein found in the brain and cerebellum. Mutations in this gene result in spinocerebellar ataxia 21. NA
ENSG00000168016 9881 tetratricopeptide repeat and ankyrin repeat containing 1 TRANK1 NA NA
ENSG00000248485 654790 Purkinje cell protein 4 like 1 PCP4L1 NA NA
ENSG00000265168 ENSG00000265168 NA RP11-192H23.5 NA NA
ENSG00000116741 5997 regulator of G-protein signaling 2 RGS2 Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. NA
ENSG00000260572 ENSG00000260572 NA RP11-16N11.2 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query X_id name summary notfound
GTF2IP13 ENSG00000272556 ENSG00000272556 general transcription factor IIi pseudogene 13 NA NA
FOSL1 ENSG00000175592 8061 FOS like 1, AP-1 transcription factor subunit The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. NA
CCER2 ENSG00000262484 643669 coiled-coil glutamate rich protein 2 NA NA
CEBPA ENSG00000245848 1050 CCAAT/enhancer binding protein alpha This intronless gene encodes a transcription factor that contains a basic leucine zipper (bZIP) domain and recognizes the CCAAT motif in the promoters of target genes. The encoded protein functions in homodimers and also heterodimers with CCAAT/enhancer-binding proteins beta and gamma. Activity of this protein can modulate the expression of genes involved in cell cycle regulation as well as in body weight homeostasis. Mutation of this gene is associated with acute myeloid leukemia. The use of alternative in-frame non-AUG (GUG) and AUG start codons results in protein isoforms with different lengths. Differential translation initiation is mediated by an out-of-frame, upstream open reading frame which is located between the GUG and the first AUG start codons. NA
CTD-3025N20.3 ENSG00000272010 ENSG00000272010 NA NA NA
AC017101.10 ENSG00000227227 ENSG00000227227 NA NA NA
IPO7P2 ENSG00000225674 ENSG00000225674 importin 7 pseudogene 2 NA NA
GLI1 ENSG00000111087 2735 GLI family zinc finger 1 This gene encodes a member of the Kruppel family of zinc finger proteins. The encoded transcription factor is activated by the sonic hedgehog signal transduction cascade and regulates stem cell proliferation. The activity and nuclear localization of this protein is negatively regulated by p53 in an inhibitory loop. Multiple transcript variants encoding different isoforms have been found for this gene. NA
NAMPTP1 ENSG00000229644 ENSG00000229644 nicotinamide phosphoribosyltransferase pseudogene 1 NA NA
GPRC5A ENSG00000013588 9052 G protein-coupled receptor class C group 5 member A This gene encodes a member of the type 3 G protein-coupling receptor family, characterized by the signature 7-transmembrane domain motif. The encoded protein may be involved in interaction between retinoid acid and G protein signalling pathways. Retinoic acid plays a critical role in development, cellular growth, and differentiation. This gene may play a role in embryonic development and epithelial cell differentiation. NA
NR4A3 ENSG00000119508 8013 nuclear receptor subfamily 4 group A member 3 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. NA
CCDC150 ENSG00000144395 284992 coiled-coil domain containing 150 NA NA
CIDEC ENSG00000187288 63924 cell death inducing DFFA like effector c This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. NA
FXYD1 ENSG00000266964 5348 FXYD domain containing ion transport regulator 1 This gene encodes a member of a family of small membrane proteins that share a 35-amino acid signature sequence domain, beginning with the sequence PFXYD and containing 7 invariant and 6 highly conserved amino acids. The approved human gene nomenclature for the family is FXYD-domain containing ion transport regulator. Mouse FXYD5 has been termed RIC (Related to Ion Channel). FXYD2, also known as the gamma subunit of the Na,K-ATPase, regulates the properties of that enzyme. FXYD1 (phospholemman), FXYD2 (gamma), FXYD3 (MAT-8), FXYD4 (CHIF), and FXYD5 (RIC) have been shown to induce channel activity in experimental expression systems. Transmembrane topology has been established for two family members (FXYD1 and FXYD2), with the N-terminus extracellular and the C-terminus on the cytoplasmic side of the membrane. The protein encoded by this gene is a plasma membrane substrate for several kinases, including protein kinase A, protein kinase C, NIMA kinase, and myotonic dystrophy kinase. It is thought to form an ion channel or regulate ion channel activity. Transcript variants with different 5’ UTR sequences have been described in the literature. NA
CTD-2527I21.4 ENSG00000221857 ENSG00000221857 NA NA NA
CYP1A1 ENSG00000140465 1543 cytochrome P450 family 1 subfamily A member 1 This gene, CYP1A1, encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and its expression is induced by some polycyclic aromatic hydrocarbons (PAHs), some of which are found in cigarette smoke. The enzyme’s endogenous substrate is unknown; however, it is able to metabolize some PAHs to carcinogenic intermediates. The gene has been associated with lung cancer risk. A related family member, CYP1A2, is located approximately 25 kb away from CYP1A1 on chromosome 15. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
SRXN1 ENSG00000271303 140809 sulfiredoxin 1 NA NA
ACTG1P17 ENSG00000259315 283693 actin gamma 1 pseudogene 17 NA NA
KRTAP5-9 ENSG00000254997 3846 keratin associated protein 5-9 NA NA
CTD-2517M22.14 ENSG00000255182 ENSG00000255182 NA NA NA
ZNF770 ENSG00000198146 54989 zinc finger protein 770 NA NA
RP11-618G20.1 ENSG00000258964 ENSG00000258964 NA NA NA
VLDLR-AS1 ENSG00000236404 401491 VLDLR antisense RNA 1 NA NA
GPT ENSG00000167701 2875 glutamic-pyruvate transaminase (alanine aminotransferase) This gene encodes cytosolic alanine aminotransaminase 1 (ALT1); also known as glutamate-pyruvate transaminase 1. This enzyme catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate and, therefore, plays a key role in the intermediary metabolism of glucose and amino acids. Serum activity levels of this enzyme are routinely used as a biomarker of liver injury caused by drug toxicity, infection, alcohol, and steatosis. A related gene on chromosome 16 encodes a putative mitochondrial alanine aminotransaminase. NA
UBE2V1P2 ENSG00000214192 ENSG00000214192 ubiquitin conjugating enzyme E2 variant 1 pseudogene 2 NA NA
RP11-134K13.4 ENSG00000271967 ENSG00000271967 NA NA NA
NPM1P39 ENSG00000225159 ENSG00000225159 nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 39 NA NA
RP11-1096G20.5 ENSG00000266368 ENSG00000266368 NA NA NA
RARRES1 ENSG00000118849 5918 retinoic acid receptor responder 1 This gene was identified as a retinoid acid (RA) receptor-responsive gene. It encodes a type 1 membrane protein. The expression of this gene is upregulated by tazarotene as well as by retinoic acid receptors. The expression of this gene is found to be downregulated in prostate cancer, which is caused by the methylation of its promoter and CpG island. Alternatively spliced transcript variant encoding distinct isoforms have been observed. NA
RP11-130L8.2 ENSG00000269976 ENSG00000269976 NA NA NA
IFFO2 ENSG00000169991 126917 intermediate filament family orphan 2 NA NA
MTPN ENSG00000105887 136319 myotrophin The transcript produced from this gene is bi-cistronic and can encode both myotrophin and leucine zipper protein 6. The myotrophin protein is associated with cardiac hypertrophy, where it is involved in the conversion of NFkappa B p50-p65 heterodimers to p50-p50 and p65-p65 homodimers. This protein also has a potential function in cerebellar morphogenesis, and it may be involved in the differentiation of cerebellar neurons, particularly of granule cells. A cryptic ORF at the 3’ end of this transcript uses a novel internal ribosome entry site and a non-AUG translation initiation codon to produce leucine zipper protein 6, a 6.4 kDa tumor antigen that is associated with myeloproliferative disease. NA
RP11-299M14.2 ENSG00000255343 ENSG00000255343 NA NA NA
NA ENSG00000273075 NA NA NA TRUE
SLC35E1 ENSG00000127526 79939 solute carrier family 35 member E1 NA NA
FAM229A ENSG00000225828 100128071 family with sequence similarity 229 member A NA NA
SMARCA5-AS1 ENSG00000245112 ENSG00000245112 SMARCA5 antisense RNA 1 NA NA
HIF1A ENSG00000100644 3091 hypoxia inducible factor 1 alpha subunit This gene encodes the alpha subunit of transcription factor hypoxia-inducible factor-1 (HIF-1), which is a heterodimer composed of an alpha and a beta subunit. HIF-1 functions as a master regulator of cellular and systemic homeostatic response to hypoxia by activating transcription of many genes, including those involved in energy metabolism, angiogenesis, apoptosis, and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia. HIF-1 thus plays an essential role in embryonic vascularization, tumor angiogenesis and pathophysiology of ischemic disease. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. NA
CAHM ENSG00000270419 100526820 colon adenocarcinoma hypermethylated (non-protein coding) NA NA
GSDMB ENSG00000073605 55876 gasdermin B This gene encodes a member of the gasdermin-domain containing protein family. Other gasdermin-family genes are implicated in the regulation of apoptosis in epithelial cells, and are linked to cancer. Multiple transcript variants encoding different isoforms have been found for this gene. Additional variants have been described, but they are candidates for nonsense-mediated mRNA decay (NMD) and are unlikely to be protein-coding. NA
TBX15 ENSG00000092607 6913 T-box 15 This gene belongs to the T-box family of genes, which encode a phylogenetically conserved family of transcription factors that regulate a variety of developmental processes. All these genes contain a common T-box DNA-binding domain. Mutations in this gene are associated with Cousin syndrome. NA
ARG2 ENSG00000081181 384 arginase 2 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exists (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type II isoform encoded by this gene, is located in the mitochondria and expressed in extra-hepatic tissues, especially kidney. The physiologic role of this isoform is poorly understood; it is thought to play a role in nitric oxide and polyamine metabolism. Transcript variants of the type II gene resulting from the use of alternative polyadenylation sites have been described. NA
NPM1P6 ENSG00000213881 ENSG00000213881 nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 6 NA NA
MIR3661 ENSG00000266751 100500905 microRNA 3661 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. NA
RP1-117B12.4 ENSG00000253102 ENSG00000253102 NA NA NA
AZGP1 ENSG00000160862 563 alpha-2-glycoprotein 1, zinc-binding NA NA
RP11-46F15.2 ENSG00000238260 ENSG00000238260 NA NA NA
RP4-791M13.3 ENSG00000254539 ENSG00000254539 NA NA NA
NAMPT ENSG00000105835 10135 nicotinamide phosphoribosyltransferase This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. NA
DDX21 ENSG00000165732 9188 DEAD-box helicase 21 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an antigen recognized by autoimmune antibodies from a patient with watermelon stomach disease. This protein unwinds double-stranded RNA, folds single-stranded RNA, and may play important roles in ribosomal RNA biogenesis, RNA editing, RNA transport, and general transcription. NA
RP11-457M11.5 ENSG00000261584 ENSG00000261584 NA NA NA
AC025442.3 ENSG00000253744 ENSG00000253744 NA NA NA
KIAA1683 ENSG00000130518 80726 KIAA1683 NA NA
RP11-458D21.1 ENSG00000233396 ENSG00000233396 NA NA NA
LRRC59 ENSG00000108829 55379 leucine rich repeat containing 59 NA NA
TWF1P1 ENSG00000178082 ENSG00000178082 twinfilin 1 pseudogene 1 NA NA
ZNF426 ENSG00000130818 79088 zinc finger protein 426 Kaposi’s sarcoma-associated herpesvirus (KSHV) can be reactivated from latency by the viral protein RTA. The protein encoded by this gene is a zinc finger transcriptional repressor that interacts with RTA to modulate RTA-mediated reactivation of KSHV. While the encoded protein can repress KSHV reactivation, RTA can induce degradation of this protein through the ubiquitin-proteasome pathway to overcome the repression. Several transcript variants encoding different isoforms have been found for this gene. NA
NRBP2 ENSG00000185189 340371 nuclear receptor binding protein 2 NA NA
RP11-127B20.3 ENSG00000272677 ENSG00000272677 NA NA NA
RP11-299G20.2 ENSG00000259172 ENSG00000259172 NA NA NA
YWHAZP3 ENSG00000229932 ENSG00000229932 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta pseudogene 3 NA NA
CTD-2373J6.1 ENSG00000260871 ENSG00000260871 NA NA NA
NA ENSG00000269942 NA NA NA TRUE
RP11-16E12.2 ENSG00000259772 ENSG00000259772 NA NA NA
SNHG11 ENSG00000174365 128439 small nucleolar RNA host gene 11 This gene is a member of the non-protein-coding multiple snoRNA host gene family. Two snoRNAs are derived from the introns of this host gene. Although many alternative splice variants have been observed, the gene is thought to have no protein-coding potential. NA
HESX1 ENSG00000163666 8820 HESX homeobox 1 This gene encodes a conserved homeobox protein that is a transcriptional repressor in the developing forebrain and pituitary gland. Mutations in this gene are associated with septooptic dysplasia, HESX1-related growth hormone deficiency, and combined pituitary hormone deficiency. NA
RP11-561C5.4 ENSG00000229212 ENSG00000229212 NA NA NA
CTC-336P14.1 ENSG00000271228 ENSG00000271228 NA NA NA
AC016722.4 ENSG00000228925 ENSG00000228925 NA NA NA
RP11-1277A3.3 ENSG00000272459 ENSG00000272459 NA NA NA
HSPH1 ENSG00000120694 10808 heat shock protein family H (Hsp110) member 1 NA NA
MAPK13 ENSG00000156711 5603 mitogen-activated protein kinase 13 This gene encodes a member of the mitogen-activated protein (MAP) kinase family. MAP kinases act as an integration point for multiple biochemical signals, and are involved in a wide variety of cellular processes such as proliferation, differentiation, transcription regulation and development. The encoded protein is a p38 MAP kinase and is activated by proinflammatory cytokines and cellular stress. Substrates of the encoded protein include the transcription factor ATF2 and the microtubule dynamics regulator stathmin. Alternatively spliced transcript variants have been observed for this gene. NA
AKR7L ENSG00000211454 ENSG00000211454 aldo-keto reductase family 7-like (gene/pseudogene) NA NA
RP5-867C24.5 ENSG00000261872 ENSG00000261872 NA NA NA
NA ENSG00000267167 NA NA NA TRUE
TDGP1 ENSG00000255725 ENSG00000255725 thymine-DNA glycosylase pseudogene 1 NA NA
MIR3652 ENSG00000265072 100500842 microRNA 3652 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. NA
FOSL2 ENSG00000075426 2355 FOS like 2, AP-1 transcription factor subunit The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. NA
GNRH1 ENSG00000147437 2796 gonadotropin releasing hormone 1 This gene encodes a preproprotein that is proteolytically processed to generate a peptide that is a member of the gonadotropin-releasing hormone (GnRH) family of peptides. Alternative splicing results in multiple transcript variants, at least one of which is secreted and then cleaved to generate gonadoliberin-1 and GnRH-associated peptide 1. Gonadoliberin-1 stimulates the release of luteinizing and follicle stimulating hormones, which are important for reproduction. Mutations in this gene are associated with hypogonadotropic hypogonadism. NA
PRSS1 ENSG00000204983 5644 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
RDH5 ENSG00000135437 5959 retinol dehydrogenase 5 This gene encodes an enzyme belonging to the short-chain dehydrogenases/reductases (SDR) family. This retinol dehydrogenase functions to catalyze the final step in the biosynthesis of 11-cis retinaldehyde, which is the universal chromophore of visual pigments. Mutations in this gene cause autosomal recessive fundus albipunctatus, a rare form of night blindness that is characterized by a delay in the regeneration of cone and rod photopigments. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the neighboring upstream BLOC1S1 (biogenesis of lysosomal organelles complex-1, subunit 1) gene. NA
CTD-2035E11.5 ENSG00000272144 ENSG00000272144 NA NA NA
NA ENSG00000272365 NA NA NA TRUE
TMEM133 ENSG00000170647 83935 transmembrane protein 133 There is evidence that this intronless gene is transcribed but the protein is predicted. The gene function is unknown. NA
AC005540.3 ENSG00000235852 ENSG00000235852 NA NA NA
NA ENSG00000261252 NA NA NA TRUE
PDE6C ENSG00000095464 5146 phosphodiesterase 6C This gene encodes the alpha-prime subunit of cone phosphodiesterase, which is composed of a homodimer of two alpha-prime subunits and 3 smaller proteins of 11, 13, and 15 kDa. Mutations in this gene are associated with cone dystrophy type 4 (COD4). NA
LINC01089 ENSG00000212694 338799 long intergenic non-protein coding RNA 1089 NA NA
DPF3 ENSG00000205683 8110 double PHD fingers 3 This gene encodes a member of the D4 protein family. The encoded protein is a transcription regulator that binds acetylated histones and is a component of the BAF chromatin remodeling complex. Alternate splicing results in multiple transcript variants encoding different isoforms. NA
NR1H3 ENSG00000025434 10062 nuclear receptor subfamily 1 group H member 3 The protein encoded by this gene belongs to the NR1 subfamily of the nuclear receptor superfamily. The NR1 family members are key regulators of macrophage function, controlling transcriptional programs involved in lipid homeostasis and inflammation. This protein is highly expressed in visceral organs, including liver, kidney and intestine. It forms a heterodimer with retinoid X receptor (RXR), and regulates expression of target genes containing retinoid response elements. Studies in mice lacking this gene suggest that it may play an important role in the regulation of cholesterol homeostasis. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
RAB37 ENSG00000172794 326624 RAB37, member RAS oncogene family Rab proteins are low molecular mass GTPases that are critical regulators of vesicle trafficking. For additional background information on Rab proteins, see MIM 179508. NA
GPR84 ENSG00000139572 53831 G protein-coupled receptor 84 NA NA
RP11-333E13.2 ENSG00000250568 ENSG00000250568 NA NA NA
RP11-862L9.3 ENSG00000266844 ENSG00000266844 NA NA NA
ZSWIM4 ENSG00000132003 65249 zinc finger SWIM-type containing 4 NA NA
C2orf82 ENSG00000182600 389084 chromosome 2 open reading frame 82 NA NA
HSP90AA2P ENSG00000224411 ENSG00000224411 heat shock protein 90kDa alpha family class A member 2, pseudogene NA NA
AC079305.10 ENSG00000222043 ENSG00000222043 NA NA NA
LOC171391 ENSG00000255284 171391 uncharacterized LOC171391 NA NA
RP11-408O19.5 ENSG00000271631 ENSG00000271631 NA NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);